lxtGH / DecoupleSegNets

[ECCV-2020]: Improving Semantic Segmentation via Decoupled Body and Edge Supervision
370 stars 36 forks source link

Error occurs when training with CamVid at loss parts #40

Closed gymoon10 closed 1 year ago

gymoon10 commented 1 year ago

Hi, I'm Goo-Young Moon, a master's degree student at Korea University. I read your paper interestingly while investigating the research on segmentation. And I successfully executed the train_cityscapes_ResNet50_FCN_decouple.sh and network.deepv3_decouple.DeepR101V3PlusD_m1_deeply. But, I opened the issue as I had some problems while I was executing scripts/train/Camvid/~.shs.

I followed the options of train_camvid_WideResNet38_decouple.sh except the class_uniform_tile and crop_size. As my CamVid dataset has 360 x 480 resolution. I set class_uniform_tile=180 and crop_size=320. But error occurs when calculating the loss. And it seems like that it is due to the 273 line of loss.py, where extracting threshold_index from index. The following code is threshold_index = index[min(len(index), self.min_kept) - 1].

Eventhough I successfully trained Cityscapes and KITTI, I cannot understand the reason of error. Is it my mistake or an error due to depreciation of the package? The total error message shows as below.

I'm sorry to bother you, but I'd appreciate your help. Thank you for your research and hard works.

C:\Users\iml\Desktop\DecoupleSegNets-master\venv\lib\site-packages\torch\optim\lr_scheduler.py:138: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate warnings.warn("Detected call of lr_scheduler.step() before optimizer.step(). " C:\Users\iml\Desktop\DecoupleSegNets-master\transforms\transforms.py:162: FutureWarning: multichannel is a deprecated argument name for gaussian. It will be removed in version 1.0. Please use channel_axis instead. blurred_img = gaussian(np.array(img), sigma=sigma, multichannel=True) C:\Users\iml\Desktop\DecoupleSegNets-master\transforms\transforms.py:162: FutureWarning: multichannel is a deprecated argument name for gaussian. It will be removed in version 1.0. Please use channel_axis instead. blurred_img = gaussian(np.array(img), sigma=sigma, multichannel=True) C:\Users\iml\Desktop\DecoupleSegNets-master\transforms\transforms.py:162: FutureWarning: multichannel is a deprecated argument name for gaussian. It will be removed in version 1.0. Please use channel_axis instead. blurred_img = gaussian(np.array(img), sigma=sigma, multichannel=True) C:\Users\iml\Desktop\DecoupleSegNets-master\transforms\transforms.py:162: FutureWarning: multichannel is a deprecated argument name for gaussian. It will be removed in version 1.0. Please use channel_axis instead. blurred_img = gaussian(np.array(img), sigma=sigma, multichannel=True) C:\Users\iml\Desktop\DecoupleSegNets-master\venv\lib\site-packages\torch\nn\functional.py:3734: UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead. warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.") C:\Users\iml\Desktop\DecoupleSegNets-master\venv\lib\site-packages\torch\nn\functional.py:4227: UserWarning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for details. warnings.warn( C:\Users\iml\Desktop\DecoupleSegNets-master\venv\lib\site-packages\torch\nn_reduction.py:13: UserWarning: reduction='elementwise_mean' is deprecated, please use reduction='mean' instead. warnings.warn("reduction='elementwise_mean' is deprecated, please use reduction='mean' instead.") C:\Users\iml\Desktop\DecoupleSegNets-master\loss.py:115: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument. soft = F.softmax(inp) mask_prob.shape : torch.Size([204800]) threshold : 0.7 True len(index) : 204800 min(len(index), self.min_kept) - 1 : 9999 index[min(len(index), self.min_kept) - 1] : tensor(133412, device='cuda:0') threshold_index : tensor(133412, device='cuda:0')

mask_prob.shape : torch.Size([204800]) threshold : 0.7 True len(index) : 204800 min(len(index), self.min_kept) - 1 : 4999 index[min(len(index), self.min_kept) - 1] : C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [94,0,0], thread: [32,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [94,0,0], thread: [33,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [94,0,0], thread: [34,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [94,0,0], thread: [35,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [94,0,0], thread: [36,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [94,0,0], thread: [37,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [94,0,0], thread: [38,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [94,0,0], thread: [39,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [94,0,0], thread: [40,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [94,0,0], thread: [41,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [94,0,0], thread: [42,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [94,0,0], thread: [43,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [94,0,0], thread: [44,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [94,0,0], thread: [45,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [94,0,0], thread: [46,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [94,0,0], thread: [47,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [94,0,0], thread: [48,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [94,0,0], thread: [49,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [94,0,0], thread: [50,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [94,0,0], thread: [51,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [94,0,0], thread: [52,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [94,0,0], thread: [53,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [94,0,0], thread: [54,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [94,0,0], thread: [55,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [94,0,0], thread: [56,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [94,0,0], thread: [57,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [94,0,0], thread: [58,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [94,0,0], thread: [59,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [94,0,0], thread: [60,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [94,0,0], thread: [61,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [94,0,0], thread: [62,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [94,0,0], thread: [63,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. Traceback (most recent call last): File "C:\Users\iml\Desktop\DecoupleSegNets-master\train_camvid.py", line 426, in main() File "C:\Users\iml\Desktop\DecoupleSegNets-master\train_camvid.py", line 213, in main train(train_loader, net, optim, epoch, writer) File "C:\Users\iml\Desktop\DecoupleSegNets-master\train_camvid.py", line 255, in train main_loss_dic = net(inputs, gts=(gts, edges)) File "C:\Users\iml\Desktop\DecoupleSegNets-master\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "C:\Users\iml\Desktop\DecoupleSegNets-master\network\deepv3_decouple.py", line 346, in forward return self.criterion((seg_final_out, seg_body_out, seg_edge_out), gts) File "C:\Users\iml\Desktop\DecoupleSegNets-master\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "C:\Users\iml\Desktop\DecoupleSegNets-master\loss.py", line 371, in forward losses['edge_ohem_loss'] = self.att_weight self.edge_attention(seg_in, segmask, edge_in) File "C:\Users\iml\Desktop\DecoupleSegNets-master\loss.py", line 360, in edge_attention return self.edge_ohem_loss(input, torch.where(edge.max(1)[0] > 0.8, target, filler)) File "C:\Users\iml\Desktop\DecoupleSegNets-master\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "C:\Users\iml\Desktop\DecoupleSegNets-master\loss.py", line 278, in forward print('index[min(len(index), self.min_kept) - 1] :', index[min(len(index), self.min_kept) - 1]) File "C:\Users\iml\Desktop\DecoupleSegNets-master\venv\lib\site-packages\torch_tensor.py", line 427, in repr return torch._tensor_str._str(self, tensor_contents=tensor_contents) File "C:\Users\iml\Desktop\DecoupleSegNets-master\venv\lib\site-packages\torch_tensor_str.py", line 637, in _str return _str_intern(self, tensor_contents=tensor_contents) File "C:\Users\iml\Desktop\DecoupleSegNets-master\venv\lib\site-packages\torch_tensor_str.py", line 568, in _str_intern tensor_str = _tensor_str(self, indent) File "C:\Users\iml\Desktop\DecoupleSegNets-master\venv\lib\site-packages\torch_tensor_str.py", line 328, in _tensor_str formatter = _Formatter(get_summarized_data(self) if summarize else self) File "C:\Users\iml\Desktop\DecoupleSegNets-master\venv\lib\site-packages\torch_tensor_str.py", line 111, in init value_str = "{}".format(value) File "C:\Users\iml\Desktop\DecoupleSegNets-master\venv\lib\site-packages\torch_tensor.py", line 858, in format return self.item().format(format_spec) RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Process finished with exit code 1

gymoon10 commented 1 year ago

I solved the problem by setting ignore_label=255 at datasets/camvid.py and self.nll_loss = nn.NLLLoss2d(weight, size_average, ignore_index=11) at CrossEntropyLoss2d class of loss.py (104 line). But it is still weird that why this implementation works.

And, train with CamVid dataset works well on https://github.com/lxtGH/SFSegNets without any change at dataset and loss code.