Closed mrT23 closed 1 year ago
Hi, the model was trained in fp16 but we encountered some issues (loss NaN/0) after some steps in fine-tuning and we haven't really figured why. But it does work on bf16 and fp32, you can still give fp16 it a try it might work on your hardware.
Thanks for the detailed answer :-)
Hi, thanks for sharing the code.
Can you elaborate why the default option you chose is "--no_fp16" ? If i understand correctly, the original model was trained in fp16
thanks, Tal