microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
33.6k stars 3.94k forks source link

How to set different learning rates for different parameters of LLMs #5665

Closed jpWang closed 1 week ago

jpWang commented 1 week ago

Hi, thanks for these great tools~ I am using DeepSpeed with transformers to train LLMs, and I just want to ask that how can I set different learning rates for different parameters of LLMs? For example, If I am using LLama-3, how to set "q_proj", "k_proj", and "v_proj" to learning rate A and set other parameters to learning rate B? Looking forward to your team's reply, thank you very much~