Closed waycup7 closed 1 month ago
[rl_training] score_outputs = [ get_reward_model_output(reward_model, reward_tokenizer, q, r, device) for q, r in zip(batch["query"], batch["response"]) ] rewards = calculate_rewards(score_outputs, args.reward_baseline)
验证rewards结果: bloom 7B: [tensor(-0.1087)]
llama 13B: [tensor(1.4023), tensor(-0.1759)]
请问tensor比数不同的依据是什麽原因造成? 谢谢
都是一条数据吗?
是的,找到问题了! marge有动到num_labels,感谢!
[rl_training] score_outputs = [ get_reward_model_output(reward_model, reward_tokenizer, q, r, device) for q, r in zip(batch["query"], batch["response"]) ] rewards = calculate_rewards(score_outputs, args.reward_baseline)
验证rewards结果: bloom 7B: [tensor(-0.1087)]
llama 13B: [tensor(1.4023), tensor(-0.1759)]
请问tensor比数不同的依据是什麽原因造成? 谢谢