Closed kklemon closed 3 years ago
The value that is used for masking is currently set to -1e10. In FP16 respectively mixed precision training this leads to numerical issues. This can be fixed by using float('-inf') instead as infinity has an own special representation in IEEE 754.
float('-inf')
Looks good to me :+1: I'll update it to version 0.19.1 as well.
0.19.1
Forgot to merge :sweat_smile:
The value that is used for masking is currently set to -1e10. In FP16 respectively mixed precision training this leads to numerical issues. This can be fixed by using
float('-inf')
instead as infinity has an own special representation in IEEE 754.