Overflow NaN can be happen in training

plemeri / InSPyReNet

Official PyTorch implementation of Revisiting Image Pyramid Structure for High Resolution Salient Object Detection (ACCV 2022)

MIT License

497 stars 73 forks source link

Thanks for the great work!

However, I found that the overflow NaN/Inf can happen when computing the attention score. The problem is that query, key in attention layers are not normalized and the bmm can cause inf during training and inference, especially with mixed precision, float16. I suggested authors or anyone who want to train on your own dataset to edit the code at:

https://github.com/plemeri/InSPyReNet/blob/main/lib/modules/layers.py#L167 https://github.com/plemeri/InSPyReNet/blob/main/lib/modules/attention_module.py#L82 https://github.com/plemeri/InSPyReNet/blob/main/lib/modules/attention_module.py#L90

Best,

plemeri / InSPyReNet

Overflow NaN can be happen in training #36