shibing624 / MedicalGPT

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO。
Apache License 2.0
3.24k stars 492 forks source link

请问大佬,Reward model验证分类评分,一个问题回传两个tensor? #284

Closed waycup7 closed 1 month ago

waycup7 commented 9 months ago

[rl_training] score_outputs = [ get_reward_model_output(reward_model, reward_tokenizer, q, r, device) for q, r in zip(batch["query"], batch["response"]) ] rewards = calculate_rewards(score_outputs, args.reward_baseline)

验证rewards结果: bloom 7B: [tensor(-0.1087)]

llama 13B: [tensor(1.4023), tensor(-0.1759)]

请问tensor比数不同的依据是什麽原因造成? 谢谢

shibing624 commented 9 months ago

都是一条数据吗?

waycup7 commented 9 months ago

都是一条数据吗?

是的,找到问题了! marge有动到num_labels,感谢!