Hi,
thanks for these great tools~
I am using DeepSpeed with transformers to train LLMs, and I just want to ask that how can I set different learning rates for different parameters of LLMs?
For example, If I am using LLama-3, how to set "q_proj", "k_proj", and "v_proj" to learning rate A and set other parameters to learning rate B?
Looking forward to your team's reply, thank you very much~
Hi, thanks for these great tools~ I am using DeepSpeed with transformers to train LLMs, and I just want to ask that how can I set different learning rates for different parameters of LLMs? For example, If I am using LLama-3, how to set "q_proj", "k_proj", and "v_proj" to learning rate A and set other parameters to learning rate B? Looking forward to your team's reply, thank you very much~