Open xrsrke opened 9 months ago
Nice work! I would like to know the current progress of our support for FP8, and is it possible to try training the model?
@YixinSong-e https://x.com/xariusrke/status/1826669126955278401
@YixinSong-e https://x.com/xariusrke/status/1826669126955278401
Thank you very much for the information. And thank you also for your contribution to FP8 Training. It seems that FP8 training is not an easy task. I will follow your commit and Twitter thread to try it out. Again, it's a great work! :)
1bit bfloat16 baseline config: https://github.com/huggingface/nanotron/blob/xrsrke/fp8-end-to-end/examples/fp8/ablations/configs/200m/exp54c_baseline.yaml
1b fp8, 1st mom in fp8, and 2nd mom in fp32: https://github.com/huggingface/nanotron/blob/xrsrke/fp8-end-to-end/examples/fp8/ablations/configs/200m/exp202c_like_exp54c_200m_kfloat16_mw_and_1stmom_fp8e4m3_with_correct_std_init.yaml
1b fp8 with both momentum in fp8: https://github.com/huggingface/nanotron/blob/xrsrke/fp8-end-to-end/examples/fp8/ablations/configs/200m/exp224a4_like_exp202c_but_adam_eps_1.0e-7_and_2ndmom_fp8e4m3_and_csfm_gamma_0_zeta_1.01.yaml