pranz24 / pytorch-soft-actor-critic

PyTorch implementation of soft actor critic
MIT License
795 stars 179 forks source link

A question in the deterministic case #2

Closed roosephu closed 5 years ago

roosephu commented 5 years ago

https://github.com/pranz24/pytorch-soft-actor-critic/blob/master/sac.py#L87

Should we here use new_action or self.policy(next_state_batch)?

pranz24 commented 5 years ago
  1. Correct, I also think we should use self.policy.evaluate(next_state_batch).
  2. I am still using a gaussian policy rather than a deterministic policy + fixed gaussian noise, as given in the paper.
  3. Also you will have to remove the entropy term in the policy loss, i.e., policy_loss = -(expected_new_q_value).mean() (same as DDPG policy loss). This means that we will no longer require the regularization loss.

(Although I have not given your question a lot of thought but these 3 points seemed very clear to me when I read the paper again today. I am very busy at the moment (at least this week). So, if you can give me a week's time then I might get back to you with a bit more information. Also I have no idea why I made these mistakes -_- )

pranz24 commented 5 years ago

https://github.com/pranz24/pytorch-soft-actor-critic/blob/master/sac.py#L90 I've made some changes according to your query. Let me know if there is anything else that, you think, is wrong in the implementation.

roosephu commented 5 years ago

Nice! Your code is really helpful, thanks!