Open sandipan1 opened 6 years ago
You're actually the second person to ask about this! First person sent an email. I'll add a sub-section or a "you should know" to the docs to go over this soon.
Thanks. Also since this tutorial is more in favor of learn-by-doing rather than being purely theoretical, it would be nice to see explanations with some images of neural network architectures to get a quick overview of how to implement. For e.g SAC implements about 5 NN for value ,value_target , gaussian_policy, 2 Q_networks . It would be more convenient to understand if there is some pictorial representation of the networks and their relation
count me as 3rd. For discrete action space, the entropy calculation can be directly derived from distribution. The policy loss needs probably to maximize the advantage * log_probablity. What I am confused is, do we still need 2 Q networks and 1 Value network?
Is it just average over all \pi(a|s) for all actions, as it is already parameterized?
+1 I am just learning RL and looking to modify SAC for discrete action state. If you can elaborate on how to derive the equation I can implement it and send a PR.
+1
In the docs, it is mentioned about an alternate version of SAC with slight change can be used for discrete action space. Please elaborate with some more details.