modelscope / ms-swift

Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
https://swift.readthedocs.io/zh-cn/latest/Instruction/index.html
Apache License 2.0
4.42k stars 389 forks source link

lora微调占用显存**逐渐增大**直到**爆炸** #2364

Open LixiangHello opened 4 weeks ago

LixiangHello commented 4 weeks ago

lora微调qwen2.5-7b逐渐爆显存

LixiangHello commented 4 weeks ago

应该是采样到超长序列的问题....

Jintao-Huang commented 3 weeks ago

可以通过--max_length进行限制