Closed hmchuong closed 1 year ago
We initially tried to use mixed precision, but we also ended up with divergence during training. This problem should not happen with full precision. Also, based on our knowledge, since our work is targeting high resolution results, we must use full precision.
Thanks for the great work!
However, I found that the overflow NaN/Inf can happen when computing the attention score. The problem is that query, key in attention layers are not normalized and the
bmm
can cause inf during training and inference, especially with mixed precision, float16. I suggested authors or anyone who want to train on your own dataset to edit the code at:https://github.com/plemeri/InSPyReNet/blob/main/lib/modules/layers.py#L167 https://github.com/plemeri/InSPyReNet/blob/main/lib/modules/attention_module.py#L82 https://github.com/plemeri/InSPyReNet/blob/main/lib/modules/attention_module.py#L90
Best,