hiyouga / LLaMA-Factory

Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)
https://arxiv.org/abs/2403.13372
Apache License 2.0
31.85k stars 3.91k forks source link

loss spike #5247

Open erhaquant opened 1 month ago

erhaquant commented 1 month ago

Reminder

System Info

这个框架是不是不适合从头预训练,测试了下loss不稳定,有多次loss spike

Reproduction

loss

Expected behavior

No response

Others

No response

erhaquant commented 1 month ago

merge_dataset函数原始逻辑是先对原始文本混合在经过preprocess_func转token,如果一个文本特别长比如book,会造成转完的token大部分都是book,实际测试会出现loss spike甚至不收敛; 后来把merge_dataset和preprocess_func的顺序调换了下,loss spike还存在但是收敛了,如上图,不过还是有很多loss spike