Closed flozi00 closed 3 years ago
Not sure if ByT5 supports fp16 training, cc @patrickvonplaten
Hi!
I am not sure if we have tested ByT5 with seq2seq scripts yet. Which script are you using, run_translation.py
or run_summarizaton.py
? Would be nice if you could post a snippet to reproduce this.
Also, note that T5 (and ByT5 as well) models are trained with bf16
, which may or may not work with fp16
. See this discussion or forum
However, this usually results in nan
losses which isn't the case here. So I won't be sure without looking at the command that you are using.
removing the --fp16 argument fixes it, when using run translation script
Environment info
transformers
version: masterWho can help
@patrickvonplaten @patil-suraj
Information
Model I am using (Bert, XLNet ...): byt5
for comparison, e.g coding mistake on my side I used other seq2seq models like t5 too, these models are working as expected
The problem arises when using:
The tasks I am working on is:
To reproduce
Steps to reproduce the behavior:
Expected behavior
training the model with reasonable loss and generating good text