Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
4.21k
stars
370
forks
source link
模型训练到固定step时, NCCL超时 #2359
Closed
samaritan1998 closed 13 hours ago
Describe the bug What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图) 模型训练到固定step的时候,NCCL超时
Your hardware and system info Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)
CUDA版本: 12.2 GPU型号: 8卡A100 40G torch版本: 2.4.0 accelerate: 0.34.0
Additional context Add any other context about the problem here(在这里补充其他信息)