For multimodal models like Qwen-VL or Qwen-Audio, how to finetune projector with full parameters and finetune language models with LoRA simultaneously?

modelscope / ms-swift

Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)

https://swift.readthedocs.io/zh-cn/latest/Instruction/index.html

Apache License 2.0

4.38k stars 385 forks source link

For multimodal models like Qwen-VL or Qwen-Audio, how to finetune projector with full parameters and finetune language models with LoRA simultaneously? #2471

Open jymh opened 1 week ago

jymh commented 1 week ago

目前框架支持的是full lora 二选一

如何实现同时全参微调projector并lora微调language model的需求？