Can you elaborate on running SAC on discrete action space

openai / spinningup

An educational resource to help anyone learn deep reinforcement learning.

https://spinningup.openai.com/

MIT License

10.21k stars 2.24k forks source link

Can you elaborate on running SAC on discrete action space #22

Open sandipan1 opened 6 years ago

sandipan1 commented 6 years ago

In the docs, it is mentioned about an alternate version of SAC with slight change can be used for discrete action space. Please elaborate with some more details.

jachiam commented 6 years ago

You're actually the second person to ask about this! First person sent an email. I'll add a sub-section or a "you should know" to the docs to go over this soon.

sandipan1 commented 6 years ago

Thanks. Also since this tutorial is more in favor of learn-by-doing rather than being purely theoretical, it would be nice to see explanations with some images of neural network architectures to get a quick overview of how to implement. For e.g SAC implements about 5 NN for value ,value_target , gaussian_policy, 2 Q_networks . It would be more convenient to understand if there is some pictorial representation of the networks and their relation

etendue commented 5 years ago

count me as 3rd. For discrete action space, the entropy calculation can be directly derived from distribution. The policy loss needs probably to maximize the advantage * log_probablity. What I am confused is, do we still need 2 Q networks and 1 Value network?

Wei2Wakeup commented 5 years ago

Is it just average over all \pi(a|s) for all actions, as it is already parameterized?

redknightlois commented 5 years ago

+1 I am just learning RL and looking to modify SAC for discrete action state. If you can elaborate on how to derive the equation I can implement it and send a PR.

GusHebblewhite commented 4 years ago