[epoch: 3/100, batch: 12/ 0, ite: 518] train loss: nan, tar: 12.000006/lr:0.000100
Traceback (most recent call last):
File "train_multiple_loss.py", line 149, in
loss2, loss = muti_bce_loss_fusion(d6, d1, d2, d3, d4, d5, labels_v)
File "train_multiple_loss.py", line 49, in muti_bce_loss_fusion
loss4 = bce_ssim_loss(d4,labels_v)
File "train_multiple_loss.py", line 37, in bce_ssim_loss
iou_out = iou_loss(pred,target)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/root/autodl-tmp/pytorch_iou/init.py", line 28, in forward
return _iou(pred, target, self.size_average)
File "/root/autodl-tmp/pytorch_iou/init.py", line 13, in _iou
Ior1 = torch.sum(target[i,:,:,:]) + torch.sum(pred[i,:,:,:])-Iand1
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
I think there might be a vanishing gradient happening, but I don't have a solution
/pytorch/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [290,0,0], thread: [62,0,0] Assertion
input_val >= zero && input_val <= one
failed. /pytorch/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [290,0,0], thread: [63,0,0] Assertioninput_val >= zero && input_val <= one
failed.[epoch: 3/100, batch: 11/ 0, ite: 517] train loss: 66.000237, tar: 11.000006/lr:0.000100 l0: 1.000000, l1: 1.000016, l2: 1.000000, l3: 1.000000, l4: 1.000000, l5: nan
[epoch: 3/100, batch: 12/ 0, ite: 518] train loss: nan, tar: 12.000006/lr:0.000100 Traceback (most recent call last): File "train_multiple_loss.py", line 149, in
loss2, loss = muti_bce_loss_fusion(d6, d1, d2, d3, d4, d5, labels_v)
File "train_multiple_loss.py", line 49, in muti_bce_loss_fusion
loss4 = bce_ssim_loss(d4,labels_v)
File "train_multiple_loss.py", line 37, in bce_ssim_loss
iou_out = iou_loss(pred,target)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/root/autodl-tmp/pytorch_iou/init.py", line 28, in forward
return _iou(pred, target, self.size_average)
File "/root/autodl-tmp/pytorch_iou/init.py", line 13, in _iou
Ior1 = torch.sum(target[i,:,:,:]) + torch.sum(pred[i,:,:,:])-Iand1
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
I think there might be a vanishing gradient happening, but I don't have a solution