Support H100 training with FP8 in Trainer and Deepspeed

huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

https://huggingface.co/transformers

Apache License 2.0

129.55k stars 25.72k forks source link

Support H100 training with FP8 in Trainer and Deepspeed #25333

Open michaelroyzen opened 11 months ago

michaelroyzen commented 11 months ago

Feature request

Support H100 training with FP8 in Trainer and Deepspeed

Motivation

FP8 should be much faster than FP16 on supported Hopper hardware. Particularly with Deepspeed integration @stas00

Your contribution

Happy to help in any way that I can.

amyeroberts commented 11 months ago

cc @pacman100

michaelroyzen commented 11 months ago

Any updates @pacman100 @sgugger?

sgugger commented 11 months ago

The speedup is only going to show when fully training model over 6B parameters, which is why we haven't prioritized the support in the Trainer. It is baked in Accelerate though.

michaelroyzen commented 11 months ago

Thanks for the update @sgugger. I'm training 6B+ models with Trainer + DeepSpeed using an MPI launcher, so Trainer support would be helpful.

michaelroyzen commented 11 months ago

Would love your input how FP8 can be used with Trainer + DeepSpeed @stas00

pacman100 commented 11 months ago

Hello, you can directly use the Accelerate Launcher with Trainer to use FP8 support out of the box.

Just do:

accelerate launch --mixed_precision fp8 training_script_using_trainer.py --kwarg1 value ...

With respect to FP8 support with DeepSPeed, can you raise an issue with the DeepSpeed team?

sbhavani commented 8 months ago

@michaelroyzen FP8 has been tested with all DeepSpeed ZeRO stages and is compatible with Transformer Engine. There's a basic FP8 / DeepSpeed test here. Feel free to raise an issue in Transformer Engine github if it's not working.

michaelroyzen commented 5 days ago

Hi @pacman100, it seems like accelerate does not support fp8 with neither DeepSpeed ZeRO nor FSDP. I'd appreciate your guidance here.

stas00 commented 5 days ago

@pacman100, is no longer with HF.

Tagging @muellerzr who is the new maintainer of deepspeed integration in accelerate

michaelroyzen commented 4 days ago

Thanks, Stas. @muellerzr would love to hear your thoughts. There doesn't seem to be a way to get HF Trainer/Accelerate FP8 training working with DeepSpeed ZeRO or FSDP despite @sbhavani's confirmation that it should work in ZeRO.

muellerzr commented 4 days ago

@michaelroyzen I'll be working on enabling them this week, I'll have a report back to you by Friday at the latest :)

michaelroyzen commented 23 hours ago

Thank you @muellerzr! Do you have an update? Is there anything I can help with?