huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.67k stars 27.16k forks source link

Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to HuggingFace #32861

Open JasonZhu1313 opened 3 months ago

JasonZhu1313 commented 3 months ago

Feature request

Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to HuggingFace Trainer, user could decide whether to enable kernel with a simple flag

Motivation

Liger (Linkedin GPU Efficient Runtime) Kernel is a collection of Triton kernels designed specifically for LLM training. We have implemented Hugging Face Compatible RMSNorm, RoPE, SwiGLU, CrossEntropy, FusedLinearCrossEntropy, and more to come. It can effectively increase multi-GPU training throughput by 20% and reduces memory usage by 60%. The kernel works out of the box with flash attention, PyTorch FSDP, and Microsoft DeepSpeed. We welcome contributions from the community to gather the best kernels for LLM training.

Your contribution

We (LinkedIn) will take care of work for a smooth integration and would need HF review and feedback for changes.

Benchmark

Benchmark conditions: LLaMA 3-8B, Alpaca Dataset, Max seq len = 512, Data Type = bf16, Optimizer = AdamW, Gradient Checkpointing = True, Distributed Strategy = FSDP1 on 4 A100s.

The throughput increases by approximately 20% with more data, but the GPU memory is reduced by 40%. This means you can train the model on smaller GPUs, with larger batch sizes, or with longer sequence lengths at no additional cost.

image (3) image (4)

For more detailed benchmark setup and more exciting efficiency for multi-head training (Medusa), please refer to original repo: https://github.com/linkedin/Liger-Kernel (Repo will be public soon)

amyeroberts commented 3 months ago

cc @ArthurZucker @muellerzr

ArthurZucker commented 3 months ago

Sounds great! Awesome work from your team 🥳

ByronHsu commented 3 months ago

https://github.com/linkedin/Liger-Kernel/issues/70 Would love to have discussion on the better UX. cc @ArthurZucker @philschmid et al

llllvvuu commented 2 months ago

It looks like there is still an issue if using use_liger_kernel=True and torch_compile=True in Trainer with Llama: https://github.com/linkedin/Liger-Kernel/issues/174