jurgisp / pydreamer

PyTorch implementation of DreamerV2 model-based RL algorithm
MIT License
209 stars 48 forks source link

proposed fix for assert failure in dyn backprop #13

Closed bilkitty closed 1 year ago

bilkitty commented 1 year ago

The assert seems to be hard-coded for the case when self.actor_grad == reinforce. (P || !C): if gradients flow from the critic loss, then actor params must be updated using grads from both policy losses. This doesn't support the case when self.actor_grad == dynamics, where gradients flow from the dynamics estimate.