xuebinqin / U-2-Net

The code for our newly accepted paper in Pattern Recognition 2020: "U^2-Net: Going Deeper with Nested U-Structure for Salient Object Detection."
Apache License 2.0
8.31k stars 1.43k forks source link

Hi, when I run this code, I get strange errors in other detection tasks.The following is the warning where the error occurs #357

Open 1dhuh opened 1 year ago

1dhuh commented 1 year ago

/pytorch/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [290,0,0], thread: [62,0,0] Assertion input_val >= zero && input_val <= one failed. /pytorch/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [290,0,0], thread: [63,0,0] Assertion input_val >= zero && input_val <= one failed.

[epoch: 3/100, batch: 11/ 0, ite: 517] train loss: 66.000237, tar: 11.000006/lr:0.000100 l0: 1.000000, l1: 1.000016, l2: 1.000000, l3: 1.000000, l4: 1.000000, l5: nan

[epoch: 3/100, batch: 12/ 0, ite: 518] train loss: nan, tar: 12.000006/lr:0.000100 Traceback (most recent call last): File "train_multiple_loss.py", line 149, in loss2, loss = muti_bce_loss_fusion(d6, d1, d2, d3, d4, d5, labels_v) File "train_multiple_loss.py", line 49, in muti_bce_loss_fusion loss4 = bce_ssim_loss(d4,labels_v) File "train_multiple_loss.py", line 37, in bce_ssim_loss iou_out = iou_loss(pred,target) File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/root/autodl-tmp/pytorch_iou/init.py", line 28, in forward return _iou(pred, target, self.size_average) File "/root/autodl-tmp/pytorch_iou/init.py", line 13, in _iou Ior1 = torch.sum(target[i,:,:,:]) + torch.sum(pred[i,:,:,:])-Iand1 RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. I think there might be a vanishing gradient happening, but I don't have a solution

sree3333 commented 2 months ago

Same issue for me as well, couldnt get any solution yet