kk42yy / DeMoSeg

Apache License 2.0
4 stars 1 forks source link

CUDA error: device-side assert triggered #1

Open cyj1208 opened 3 hours ago

cyj1208 commented 3 hours ago

在运行Train_DeMoSeg时发生以下报错: ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:365: operator(): block: [1779,0,0], thread: [75,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:365: operator(): block: [1779,0,0], thread: [76,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:365: operator(): block: [5251,0,0], thread: [127,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. 0%| | 0/5 [00:13<?, ?it/s] Traceback (most recent call last): File "/root/autodl-tmp/DeMoSeg-main/DeMoSeg/Train_DeMoSeg.py", line 13, in demoseg_trainer.run_train() File "/root/autodl-tmp/DeMoSeg-main/DeMoSeg/training/trainer/DeMoSeg_Trainer.py", line 191, in run_train epoch_loss[idx] = self.training_epoch(next(self.train_loader)) File "/root/autodl-tmp/DeMoSeg-main/DeMoSeg/training/trainer/BaseTrainer.py", line 299, in training_epoch l = self.loss(output, lbl) File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "/root/autodl-tmp/DeMoSeg-main/DeMoSeg/training/loss/loss_scheduler.py", line 41, in forward l = weights[0] self.loss(inputs) File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "/root/miniconda3/lib/python3.10/site-packages/monai/losses/dice.py", line 800, in forward dice_loss = self.dice(input, target) File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/root/miniconda3/lib/python3.10/site-packages/monai/losses/dice.py", line 158, in forward target = one_hot(target, num_classes=n_pred_ch) File "/root/miniconda3/lib/python3.10/site-packages/monai/networks/utils.py", line 192, in onehot labels = o.scatter(dim=dim, index=labels.long(), value=1) RuntimeError: CUDA error: device-side assert triggered Compile with TORCH_USE_CUDA_DSA to enable device-side assertions. 这个错误该如何解决呢?

kk42yy commented 2 hours ago

可能是环境配置造成的,我们基于python3.8开发的DeMoSeg,请参照ReadMe文件进行环境配置,确保每个库的版本一致。