jurgisp / pydreamer

PyTorch implementation of DreamerV2 model-based RL algorithm
MIT License
203 stars 47 forks source link

require_grad is False when actor_grad is dynamic #12

Open mrsamsami opened 1 year ago

mrsamsami commented 1 year ago

When running the code on DMC, because the actor_grad is dynamics; therefore, loss_policy would be -value_target. value_target is not dependent on the actor's policy distribution, and so, loss_policy does not have any gradient flowing through it with respect to the actor's parameters. The assertion will be assert (False and True) or not True, since loss_policy does not require gradients. Therefore, the assertion becomes False. How can we fix it?

bilkitty commented 1 year ago

I could be completely wrong, but is it possible that the assert is hard-coded for the case when actor_grad == "reinforce"?

artemZholus commented 1 year ago

Same problem here!

jurgisp commented 1 year ago

You are right, it is very likely that assertion was written assuming actor_grad=reinforce. What if you simply remove it, does it work then?

To be honest, I did way less testing with actor_grad=dynamics. The functionality did work at one point and was tested with DMC, but something could have changed since then.

bilkitty commented 1 year ago

Yes, ignoring it works. However, I'm still a little confused about how the dynamics back-prop works given the non-diff value target. If we scope out the entropy loss, can you clarify how, in the code, the actor's parameters are updated?

AAgha66 commented 1 month ago

If you use the reinforce policy gradient then you don't back-prop through the dynamics anymore.