young-geng / EasyLM

Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax.
Apache License 2.0
2.33k stars 247 forks source link

Is there a plan to support training with fp8? #83

Closed joytianya closed 12 months ago

joytianya commented 12 months ago

Models are becoming increasingly larger, and there is a desire to train on even larger models. Can training with fp8 be supported? https://github.com/NVIDIA/TransformerEngine

young-geng commented 12 months ago

Unfortunately I don't have plans to support that right now. Using fp8 won't save much memory since the model weights are still in fp32 during training. Also since I don't have access to any H100s I won't be able to test any fp8 implementations.

joytianya commented 12 months ago

"Using fp8 won't save much memory since the model weights are still in fp32 during training"

When training with fp8, it can save memory, right?