modelscope / ms-swift

Use PEFT or Full-parameter to finetune 350+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
https://swift.readthedocs.io/zh-cn/latest/Instruction/index.html
Apache License 2.0
3.78k stars 325 forks source link

Loss and acc drop to 0 after several steps #1062

Open MindLostGuy opened 4 months ago

MindLostGuy commented 4 months ago

I try to full finetune the deepseek-vl on multi-nodes with deepspeed-zero2. The loss and acc work fine in the beginning: nomral steps vary from the number of used nodes. However, the training loss and acc drop to zero without warning after certain steps. Despite the loss down to zero, the training continues, and the eval will give a NaN loss result instead.

Do you have any idea of the problem? Thanks

tastelikefeet commented 1 month ago

This is because some grads are NaN, can you try to give the command you are using? or just try to reduce learning_rate, or use bf16 instead of fp16