Closed bilkitty closed 4 months ago
reward preds are not detached so as to yield policy_loss.require_grad = True (see diff in commit 7677f5c)
It is important that the reward predictions are detached.
Thanks! That was a bad guess.
Going back to questioning the assert, I've updated the branch to enforce it only when using reinforce.
reward preds are not detached so as to yield policy_loss.require_grad = True (see diff in commit 7677f5c)