yuhongtian17 / Spatial-Transform-Decoupling

MIT License
88 stars 7 forks source link

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED #31

Open Tianxuandu opened 3 months ago

Tianxuandu commented 3 months ago

2024-07-31 20:36:24,151 - mmrotate - INFO - workflow: [('train', 1)], max: 12 epochs 2024-07-31 20:36:24,151 - mmrotate - INFO - Checkpoints will be saved to /home/shs/STD_new/work_dirs/rotated_imted_hb1m_oriented_rcnn_hivitdet_base_1x_dota_ms_rr_le90_stdc_xyawh321v by HardDiskBackend. Traceback (most recent call last): File "main/mmrotate-main/tools/train.py", line 199, in main() File "main/mmrotate-main/tools/train.py", line 188, in main train_detector( File "/home/shs/STD_new/mmrotate/mmrotate/apis/train.py", line 144, in train_detector runner.run(data_loaders, cfg.workflow) File "/home/shs/anaconda3/envs/STD3/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 136, in run epoch_runner(data_loaders[i], kwargs) File "/home/shs/anaconda3/envs/STD3/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 53, in train self.run_iter(data_batch, train_mode=True, kwargs) File "/home/shs/anaconda3/envs/STD3/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 31, in run_iter outputs = self.model.train_step(data_batch, self.optimizer, File "/home/shs/anaconda3/envs/STD3/lib/python3.8/site-packages/mmcv/parallel/data_parallel.py", line 77, in train_step return self.module.train_step(inputs[0], kwargs[0]) File "/home/shs/anaconda3/envs/STD3/lib/python3.8/site-packages/mmdet/models/detectors/base.py", line 248, in train_step losses = self(data) File "/home/shs/anaconda3/envs/STD3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/home/shs/anaconda3/envs/STD3/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 116, in new_func return old_func(args, kwargs) File "/home/shs/anaconda3/envs/STD3/lib/python3.8/site-packages/mmdet/models/detectors/base.py", line 172, in forward return self.forward_train(img, img_metas, kwargs) File "/home/shs/STD_new/mmrotate/mmrotate/models/detectors/rotated_imted.py", line 102, in forward_train x = self.extract_feat(img) File "/home/shs/STD_new/mmrotate/mmrotate/models/detectors/rotated_imted.py", line 39, in extract_feat x = self.backbone(img) File "/home/shs/anaconda3/envs/STD3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/home/shs/STD_new/mmrotate/mmrotate/models/backbones/lsknet.py", line 224, in forward x = self.forward_features(x) File "/home/shs/STD_new/mmrotate/mmrotate/models/backbones/lsknet.py", line 214, in forward_features x, H, W = patch_embed(x) File "/home/shs/anaconda3/envs/STD3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "/home/shs/STD_new/mmrotate/mmrotate/models/backbones/lsknet.py", line 127, in forward x = self.norm(x)
File "/home/shs/anaconda3/envs/STD3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(
input, **kwargs) File "/home/shs/anaconda3/envs/STD3/lib/python3.8/site-packages/torch/nn/modules/batchnorm.py", line 131, in forward return F.batch_norm( File "/home/shs/anaconda3/envs/STD3/lib/python3.8/site-packages/torch/nn/functional.py", line 2056, in batch_norm return torch.batch_norm( RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED 采用单核跑

yuhongtian17 commented 3 months ago

检查您的cuDNN版本、PyTorch版本与CUDA Toolkit版本是否匹配,CUDA Toolkit版本与显卡型号是否匹配