Training 过程卡住 - Githubissues

modelscope / swift

ms-swift: Use PEFT or Full-parameter to finetune 250+ LLMs or 35+ MLLMs. (Qwen2, GLM4, Internlm2, Yi, Llama3, Llava, MiniCPM-V, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)

Apache License 2.0

2.13k stars 205 forks source link

Describe the bug What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程，最好有截图) Qwen1.5用lora进行sft时候，训练日志一直处理不更新，GPU利用率为0

一直是这个状态，我将取数据集中前100个数据，就可以训练了，这是什么原因

Your hardware and system info Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息，如CUDA版本，系统，GPU型号和torch版本等)

Additional context Add any other context about the problem here(在这里补充其他信息)

modelscope / swift

Training 过程卡住 #1204