How to set different learning rate for different groups of parameters in fine-tuning?

modelscope / ms-swift

Use PEFT or Full-parameter to finetune 300+ LLMs or 80+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V-2.6, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)

https://swift.readthedocs.io/zh-cn/latest/Instruction/index.html

Apache License 2.0

3.39k stars 289 forks source link

How to set different learning rate for different groups of parameters in fine-tuning? #1271

Closed rationalspark closed 2 weeks ago

rationalspark commented 2 months ago

Can grouped learning rates be passed to swift in CLI in fine-tuning of LLMs? If it is not supported yet, how to modify the codes to use grouped learning rates? Thanks for your kindness help. Best wishes.

rationalspark commented 2 months ago

Can somebody help me?

tastelikefeet commented 2 weeks ago

Sorry for replying late... https://github.com/modelscope/ms-swift/blob/main/swift/trainers/mixin.py#L660 modify the code here. And we will do a big refactor to provide the plugin ability