plemeri / InSPyReNet

Official PyTorch implementation of Revisiting Image Pyramid Structure for High Resolution Salient Object Detection (ACCV 2022)
MIT License
321 stars 61 forks source link

Overflow NaN can be happen in training #36

Closed hmchuong closed 8 months ago

hmchuong commented 10 months ago

Thanks for the great work!

However, I found that the overflow NaN/Inf can happen when computing the attention score. The problem is that query, key in attention layers are not normalized and the bmm can cause inf during training and inference, especially with mixed precision, float16. I suggested authors or anyone who want to train on your own dataset to edit the code at:

https://github.com/plemeri/InSPyReNet/blob/main/lib/modules/layers.py#L167 https://github.com/plemeri/InSPyReNet/blob/main/lib/modules/attention_module.py#L82 https://github.com/plemeri/InSPyReNet/blob/main/lib/modules/attention_module.py#L90

Best,

plemeri commented 9 months ago

We initially tried to use mixed precision, but we also ended up with divergence during training. This problem should not happen with full precision. Also, based on our knowledge, since our work is targeting high resolution results, we must use full precision.