yijingru / BBAVectors-Oriented-Object-Detection

[WACV2021] Oriented Object Detection in Aerial Images with Box Boundary-Aware Vectors
MIT License
462 stars 87 forks source link

RuntimeError: CUDA error: device-side assert triggered 问题求助! #116

Closed yangyahu-1994 closed 2 years ago

yangyahu-1994 commented 2 years ago

(BBAVectors) yyh@ubuntu:~/Documents/BBAVectors-Oriented-Object-Detection$ python main.py --data_dir dota_bbav --num_epoch 20 --batch_size 4 --dataset dota --phase train Setting up data... Starting training...

Epoch: 1/20 hm loss is nan wh loss is 1.2452343702316284 off loss is 0.032581768929958344 /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [0,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [1,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [2,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [3,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [4,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [5,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [6,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [7,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [8,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [9,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [10,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [11,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [12,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [13,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [14,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [15,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [16,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [17,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [18,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [19,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [20,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [21,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [22,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [23,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [24,0,0] Assertion input_val >= zero && input_val <= one failed. Traceback (most recent call last): File "main.py", line 58, in ctrbox_obj.train_network(args) File "/home/yyh/Documents/BBAVectors-Oriented-Object-Detection/train.py", line 130, in train_network epoch_loss = self.run_epoch(phase='train', File "/home/yyh/Documents/BBAVectors-Oriented-Object-Detection/train.py", line 167, in run_epoch loss = criterion(pr_decs, data_dict) File "/home/yyh/anaconda3/envs/BBAVectors/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/yyh/Documents/BBAVectors-Oriented-Object-Detection/loss.py", line 120, in forward if isnan(hm_loss) or isnan(wh_loss) or isnan(off_loss): RuntimeError: CUDA error: device-side assert triggered 大佬,请问这是什么原因呢?

yangyahu-1994 commented 2 years ago

训练第一个epoch的时候hm loss会出现nan