modelscope / swift

ms-swift: Use PEFT or Full-parameter to finetune 250+ LLMs or 35+ MLLMs. (Qwen2, GLM4, Internlm2, Yi, Llama3, Llava, MiniCPM-V, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)
Apache License 2.0
2.13k stars 205 forks source link

Training 过程卡住 #1204

Closed zkyredstart closed 3 hours ago

zkyredstart commented 1 week ago

Describe the bug What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图) Qwen1.5用lora进行sft时候,训练日志一直处理不更新,GPU利用率为0



Your hardware and system info Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)

Additional context Add any other context about the problem here(在这里补充其他信息)

zkyredstart commented 1 week ago

数据集是train1.jaon, train2.json, train3.json 合成一个train.json,三个子集都可以正常训练,合成一个大的就不行了