jurgisp / pydreamer

PyTorch implementation of DreamerV2 model-based RL algorithm
MIT License
209 stars 48 forks source link

proposed fix for dynamics back-prop #14

Closed bilkitty closed 4 months ago

bilkitty commented 1 year ago

reward preds are not detached so as to yield policy_loss.require_grad = True (see diff in commit 7677f5c)

jurgisp commented 1 year ago

It is important that the reward predictions are detached.

bilkitty commented 1 year ago

Thanks! That was a bad guess.

Going back to questioning the assert, I've updated the branch to enforce it only when using reinforce.