Open murphypei opened 1 year ago
The following code works well.
if (step + 1) % 100 == 0:
reward_score, rejected_scores, acc, score_std = evaluation_reward(rm_model, eval_dataloader)
if args.global_rank == 0:
wandb.log({
'Eval/epoch': -1,
'Eval/reward_score': reward_score,
'Eval/score_std': score_std,
'Eval/rejected_scores': rejected_scores,
'Eval/acc': acc,
})
The following code works well.
if (step + 1) % 100 == 0: reward_score, rejected_scores, acc, score_std = evaluation_reward(rm_model, eval_dataloader) if args.global_rank == 0: wandb.log({ 'Eval/epoch': -1, 'Eval/reward_score': reward_score, 'Eval/score_std': score_std, 'Eval/rejected_scores': rejected_scores, 'Eval/acc': acc, })
You are right. The condition args.global_rank == 0 has to be removed, since the evaluation_reward method needs all processes to participate.
In addition, there are another bug. The rm_model.train() should be put in the step loop:
In addition, there are another bug. The rm_model.train() should be put in the step loop:
OK, thanks for your reply.
Firstly, thank you for your contributions. I consistently pause (but do not exit) at the evaluation_reward during the training of step 2. Hence, I am wondering if there is something wrong. Perhaps the condition
args.global_rank == 0
is unnecessary? Any suggestions would be greatly appreciated. Thank you.