Open AmitMY opened 8 months ago
Confirmed. without precision: float16
it decodes correctly, for example:
0 ||| $pt $bzs M p500 p500 S300 c0 r4 p482 p482 S17d c1 re p492 p522 S177 c0 re p490 p556 S22a c0 r4 p513 ||| F0= -12.5695 F1= -12.547 ||| -1.04652
(which in sign language visualizes to
)
Half precision is useful only to speed up inference on supported GPUs for the translate
step. We usually use this mode and it works. Generally, it shouldn't be used for evaluation unless you're planning to run your model in this mode.
The example here https://github.com/mozilla/firefox-translations-training/blob/main/configs/config.prod.yml#L57-L61 says "2080ti or newer", but for me on a 2080ti, this setting causes nans for the inference (not for evaluation)
After training two, reasonable teacher models, and seeing that their ensemble results are reasonable as well, seeing the
Translating corpus with teacher
output is bad (repeating a random token, and givingnan
s)and all of the other lines look very similar.
GPU: NVIDIA GeForce RTX 2080 Ti,
decoding-teacher
withprecision: float16
My current suspicion is that this is because of the
precision: float16
, but I can't immediately confirm it since I destroyed a lot of my environment just to figure out what's wrong here...