tatp22 / linformer-pytorch

My take on a practical implementation of Linformer for Pytorch.
https://arxiv.org/pdf/2006.04768.pdf
MIT License
400 stars 36 forks source link

Use -inf as mask value for the causal mask #19

Closed kklemon closed 3 years ago

kklemon commented 3 years ago

The value that is used for masking is currently set to -1e10. In FP16 respectively mixed precision training this leads to numerical issues. This can be fixed by using float('-inf') instead as infinity has an own special representation in IEEE 754.

tatp22 commented 3 years ago

Looks good to me :+1: I'll update it to version 0.19.1 as well.

tatp22 commented 3 years ago

Forgot to merge :sweat_smile: