shibing624 / MedicalGPT

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO。
Apache License 2.0
3.24k stars 492 forks source link

PT和SFT之后,使用SFT模型预测报错:RuntimeError: probability tensor contains either inf, nan or element < 0 #249

Closed dage0127 closed 11 months ago

dage0127 commented 11 months ago
  1. 使用baichuan2-7B-Chat进行PT(预训练),合并,SFT(监督调优),合并;
  2. 使用PT(预训练),合并后的模型进行预测,正常; python inference.py --model_type baichuan --base_model merged-pt --template_name baichuan --interactive
  3. 使用SFT(监督调优),合并后的模型预测,报错。请帮忙看看,谢谢。

指令: python inference.py --model_type baichuan --base_model merged-sft --template_name baichuan --interactive

报错如下:

: hello : Exception in thread Thread-2 (generate): Traceback (most recent call last): File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/threading.py", line 953, in run self._target(*self._args, **self._kwargs) File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/transformers/generation/utils.py", line 1642, in generate return self.sample( File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/transformers/generation/utils.py", line 2760, in sample next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1) RuntimeError: probability tensor contains either inf, nan or element < 0
shibing624 commented 11 months ago
  1. 给出transformers版本,可以试试升级transformers版本;
  2. 多试几个case,给出错误完整日志;
dage0127 commented 11 months ago

多谢多谢,我回头试试。