hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
https://arxiv.org/abs/2403.13372
Apache License 2.0
34k stars 4.19k forks source link

Baichuan13B在PPO时,loss为nan。完成训练后,模型回答为乱码 #188

Closed LEOMessi6 closed 1 year ago

LEOMessi6 commented 1 year ago

image image

LEOMessi6 commented 1 year ago

训练脚本如下:

image
hiyouga commented 1 year ago

替换模型文件后重试 https://github.com/hiyouga/LLaMA-Efficient-Tuning/blob/main/tests/modeling_baichuan.py

LEOMessi6 commented 1 year ago

替换模型文件后重试 https://github.com/hiyouga/LLaMA-Efficient-Tuning/blob/main/tests/modeling_baichuan.py

大佬,我将之前huggingface上的modeling_baichuan.py用最新的modeling_baichuan.py替换了,但还是出现上述情况。并且很神奇的是大部分为nan的loss中,混入了一个有值的loss

image
LEOMessi6 commented 1 year ago

我使用的PPO数据带有history字段,我使用实例数据(alpaca_gpt4_zh)则loss不为空 ,reward为负数。初步认为是训练数据的问题

LEOMessi6 commented 1 year ago

PPO时,loss正常,reward为负数,但将STF模型与PPO模型合在一起,推理乱码: 合并代码脚本为:

image

推理结果为:

image