microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Apache License 2.0
33.6k stars 3.94k forks source link

How to set different learning rates for different parameters of LLMs #5665

Closed jpWang closed 1 week ago

jpWang commented 1 week ago

Hi, thanks for these great tools~ I am using DeepSpeed with transformers to train LLMs, and I just want to ask that how can I set different learning rates for different parameters of LLMs? For example, If I am using LLama-3, how to set "q_proj", "k_proj", and "v_proj" to learning rate A and set other parameters to learning rate B? Looking forward to your team's reply, thank you very much~