Open zhuweipg99 opened 1 year ago
When I looked into the compute the value loss in trainer.py line 1012-1017,
value_loss_clipped = old_values + (values - old_values).clamp(-critic_eps_clip, critic_eps_clip) value_loss1 = (value_loss_clipped - rewards) ** 2 value_loss2 = (values - rewards) ** 2 value_loss = torch.max(value_loss1, value_loss2).mean()
I think the values and rewards are equal to the old_values, cause they use the same model to compute the score. I will be very grateful if you guys can answer my confuse.
When I looked into the compute the value loss in trainer.py line 1012-1017,
I think the values and rewards are equal to the old_values, cause they use the same model to compute the score. I will be very grateful if you guys can answer my confuse.