Closed yangyahu-1994 closed 2 years ago
Epoch: 1/20 hm loss is nan wh loss is 1.2452343702316284 off loss is 0.032581768929958344 /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [0,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [1,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [2,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [3,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [4,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [5,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [6,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [7,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [8,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [9,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [10,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [11,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [12,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [13,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [14,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [15,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [16,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [17,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [18,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [19,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [20,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [21,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [22,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [23,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [24,0,0] Assertion input_val >= zero && input_val <= one failed. Traceback (most recent call last): File "main.py", line 58, in ctrbox_obj.train_network(args) File "/home/yyh/Documents/BBAVectors-Oriented-Object-Detection/train.py", line 130, in train_network epoch_loss = self.run_epoch(phase='train', File "/home/yyh/Documents/BBAVectors-Oriented-Object-Detection/train.py", line 167, in run_epoch loss = criterion(pr_decs, data_dict) File "/home/yyh/anaconda3/envs/BBAVectors/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/yyh/Documents/BBAVectors-Oriented-Object-Detection/loss.py", line 120, in forward if isnan(hm_loss) or isnan(wh_loss) or isnan(off_loss): RuntimeError: CUDA error: device-side assert triggered 大佬,请问这是什么原因呢?
input_val >= zero && input_val <= one
训练第一个epoch的时候hm loss会出现nan
(BBAVectors) yyh@ubuntu:~/Documents/BBAVectors-Oriented-Object-Detection$ python main.py --data_dir dota_bbav --num_epoch 20 --batch_size 4 --dataset dota --phase train Setting up data... Starting training...
Epoch: 1/20 hm loss is nan wh loss is 1.2452343702316284 off loss is 0.032581768929958344 /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [0,0,0] Assertion
ctrbox_obj.train_network(args)
File "/home/yyh/Documents/BBAVectors-Oriented-Object-Detection/train.py", line 130, in train_network
epoch_loss = self.run_epoch(phase='train',
File "/home/yyh/Documents/BBAVectors-Oriented-Object-Detection/train.py", line 167, in run_epoch
loss = criterion(pr_decs, data_dict)
File "/home/yyh/anaconda3/envs/BBAVectors/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/yyh/Documents/BBAVectors-Oriented-Object-Detection/loss.py", line 120, in forward
if isnan(hm_loss) or isnan(wh_loss) or isnan(off_loss):
RuntimeError: CUDA error: device-side assert triggered
大佬,请问这是什么原因呢?
input_val >= zero && input_val <= one
failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [1,0,0] Assertioninput_val >= zero && input_val <= one
failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [2,0,0] Assertioninput_val >= zero && input_val <= one
failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [3,0,0] Assertioninput_val >= zero && input_val <= one
failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [4,0,0] Assertioninput_val >= zero && input_val <= one
failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [5,0,0] Assertioninput_val >= zero && input_val <= one
failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [6,0,0] Assertioninput_val >= zero && input_val <= one
failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [7,0,0] Assertioninput_val >= zero && input_val <= one
failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [8,0,0] Assertioninput_val >= zero && input_val <= one
failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [9,0,0] Assertioninput_val >= zero && input_val <= one
failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [10,0,0] Assertioninput_val >= zero && input_val <= one
failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [11,0,0] Assertioninput_val >= zero && input_val <= one
failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [12,0,0] Assertioninput_val >= zero && input_val <= one
failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [13,0,0] Assertioninput_val >= zero && input_val <= one
failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [14,0,0] Assertioninput_val >= zero && input_val <= one
failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [15,0,0] Assertioninput_val >= zero && input_val <= one
failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [16,0,0] Assertioninput_val >= zero && input_val <= one
failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [17,0,0] Assertioninput_val >= zero && input_val <= one
failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [18,0,0] Assertioninput_val >= zero && input_val <= one
failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [19,0,0] Assertioninput_val >= zero && input_val <= one
failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [20,0,0] Assertioninput_val >= zero && input_val <= one
failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [21,0,0] Assertioninput_val >= zero && input_val <= one
failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [22,0,0] Assertioninput_val >= zero && input_val <= one
failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [23,0,0] Assertioninput_val >= zero && input_val <= one
failed. /opt/conda/conda-bld/pytorch_1614378083779/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [24,0,0] Assertioninput_val >= zero && input_val <= one
failed. Traceback (most recent call last): File "main.py", line 58, in