Open michaelroyzen opened 11 months ago
cc @pacman100
Any updates @pacman100 @sgugger?
The speedup is only going to show when fully training model over 6B parameters, which is why we haven't prioritized the support in the Trainer. It is baked in Accelerate though.
Thanks for the update @sgugger. I'm training 6B+ models with Trainer + DeepSpeed using an MPI launcher, so Trainer support would be helpful.
Would love your input how FP8 can be used with Trainer + DeepSpeed @stas00
Hello, you can directly use the Accelerate Launcher with Trainer to use FP8 support out of the box.
Just do:
accelerate launch --mixed_precision fp8 training_script_using_trainer.py --kwarg1 value ...
With respect to FP8 support with DeepSPeed, can you raise an issue with the DeepSpeed team?
@michaelroyzen FP8 has been tested with all DeepSpeed ZeRO stages and is compatible with Transformer Engine. There's a basic FP8 / DeepSpeed test here. Feel free to raise an issue in Transformer Engine github if it's not working.
Hi @pacman100, it seems like accelerate does not support fp8 with neither DeepSpeed ZeRO nor FSDP. I'd appreciate your guidance here.
@pacman100, is no longer with HF.
Tagging @muellerzr who is the new maintainer of deepspeed integration in accelerate
Thanks, Stas. @muellerzr would love to hear your thoughts. There doesn't seem to be a way to get HF Trainer/Accelerate FP8 training working with DeepSpeed ZeRO or FSDP despite @sbhavani's confirmation that it should work in ZeRO.
@michaelroyzen I'll be working on enabling them this week, I'll have a report back to you by Friday at the latest :)
Thank you @muellerzr! Do you have an update? Is there anything I can help with?
Feature request
Support H100 training with FP8 in Trainer and Deepspeed
Motivation
FP8 should be much faster than FP16 on supported Hopper hardware. Particularly with Deepspeed integration @stas00
Your contribution
Happy to help in any way that I can.