yeerwen / UniSeg

MICCAI 2023 Paper (Early Acceptance)
Other
174 stars 5 forks source link

RUN ERROR #21

Closed zhengyunxuan closed 10 months ago

zhengyunxuan commented 10 months ago

I first downloaded the dataset of B T C V, converted it to nnUNET format and preprocessed it, in addition to this error during training, what should I do, what is the problem.

/opt/conda/conda-bld/pytorch_1699449200967/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:365: operator(): block: [1225,0,0], thread: [11,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1699449200967/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:365: operator(): block: [1225,0,0], thread: [12,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1699449200967/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:365: operator(): block: [1225,0,0], thread: [13,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1699449200967/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:365: operator(): block: [1225,0,0], thread: [14,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1699449200967/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:365: operator(): block: [1225,0,0], thread: [15,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1699449200967/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:365: operator(): block: [1225,0,0], thread: [16,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1699449200967/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:365: operator(): block: [1225,0,0], thread: [17,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1699449200967/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:365: operator(): block: [1225,0,0], thread: [18,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1699449200967/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:365: operator(): block: [1225,0,0], thread: [19,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1699449200967/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:365: operator(): block: [1225,0,0], thread: [20,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1699449200967/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:365: operator(): block: [1225,0,0], thread: [21,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. Traceback (most recent call last): File "/home/user_gou/anaconda3/envs/uniseg/bin/nnUNet_train", line 12, in sys.exit(main()) File "/home/user_gou/anaconda3/envs/uniseg/lib/python3.9/site-packages/nnunet/run/run_training.py", line 179, in main trainer.run_training() File "/home/user_gou/anaconda3/envs/uniseg/lib/python3.9/site-packages/nnunet/training/network_training/nnUNetTrainerV2.py", line 455, in run_training ret = super().run_training() File "/home/user_gou/anaconda3/envs/uniseg/lib/python3.9/site-packages/nnunet/training/network_training/nnUNetTrainer.py", line 318, in run_training super(nnUNetTrainer, self).run_training() File "/home/user_gou/anaconda3/envs/uniseg/lib/python3.9/site-packages/nnunet/training/network_training/network_trainer.py", line 456, in run_training l = self.run_iteration(self.tr_gen, True) File "/home/user_gou/anaconda3/envs/uniseg/lib/python3.9/site-packages/nnunet/training/network_training/UniSeg_Trainer_DS.py", line 151, in run_iteration l = self.loss(output, target) File "/home/user_gou/anaconda3/envs/uniseg/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/home/user_gou/anaconda3/envs/uniseg/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "/home/user_gou/anaconda3/envs/uniseg/lib/python3.9/site-packages/nnunet/training/loss_functions/deep_supervision.py", line 39, in forward l = weights[0] self.loss(x[0], y[0]) File "/home/user_gou/anaconda3/envs/uniseg/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/user_gou/anaconda3/envs/uniseg/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "/home/user_gou/anaconda3/envs/uniseg/lib/python3.9/site-packages/nnunet/training/loss_functions/dice_loss.py", line 346, in forward dc_loss = self.dc(net_output, target, loss_mask=mask) if self.weight_dice != 0 else 0 File "/home/user_gou/anaconda3/envs/uniseg/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/user_gou/anaconda3/envs/uniseg/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, *kwargs) File "/home/user_gou/anaconda3/envs/uniseg/lib/python3.9/site-packages/nnunet/training/loss_functions/diceloss.py", line 178, in forward tp, fp, fn, = get_tp_fp_fn_tn(x, y, axes, loss_mask, False) File "/home/user_gou/anaconda3/envs/uniseg/lib/python3.9/site-packages/nnunet/training/loss_functions/dice_loss.py", line 130, in get_tp_fp_fn_tn tp = net_output y_onehot RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Exception in thread Thread-5: Traceback (most recent call last): File "/home/user_gou/anaconda3/envs/uniseg/lib/python3.9/threading.py", line 980, in _bootstrap_inner self.run() File "/home/user_gou/anaconda3/envs/uniseg/lib/python3.9/threading.py", line 917, in run self._target(*self._args, *self._kwargs) File "/home/user_gou/anaconda3/envs/uniseg/lib/python3.9/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 92, in results_loop raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the print" RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message Exception in thread Thread-4: Traceback (most recent call last): File "/home/user_gou/anaconda3/envs/uniseg/lib/python3.9/threading.py", line 980, in _bootstrap_inner self.run() File "/home/user_gou/anaconda3/envs/uniseg/lib/python3.9/threading.py", line 917, in run self._target(self._args, **self._kwargs) File "/home/user_gou/anaconda3/envs/uniseg/lib/python3.9/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 92, in results_loop raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the print" RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message 已放弃 (核心已转储)

yeerwen commented 10 months ago

This error seems to occur when calculating the loss. Therefore, check the output size and the maximum and minimum values of the target.

zhengyunxuan commented 10 months ago

This error seems to occur when calculating the loss. Therefore, check the output size and the maximum and minimum values of the target.

I feel that your project should have no problem, then I think the problem is on the dataset, below I will talk about the processing steps of my data, I did not carry out the corresponding upstream, directly do the downstream task, I first downloaded the corresponding dataset (from this URL https://www.synapse.org/#!Synapse:syn3193805/wiki/217789), Next, I converted the dataset with the Convert_BTCV_to_nnUNet_dataset.py of the project itself, and I got Task060_BTCV folder, which I put into nnUNet_raw/nnUnet_raw_data, and then I ran nnUNet_plan_and_preprocess -t 60 --verifydataset integrity, and finally I ran CUDA_VISIBLE_DEVICES=0 nnUNet_n_proc_DA=32 nnUNet_train 3d_fullres UniSeg_Trainer_DS 60 0, and the error in my issue appeared, whether there was a problem with my operation, and whether the reason for the above error was because I had a problem in data processing. Or what steps am I missing.

yeerwen commented 10 months ago

Your data preprocessing steps are fine. Therefore, I would still recommend that you print the output size as well as the maximum and minimum values of the target to check first.