shariqiqbal2810 / MAAC

Code for "Actor-Attention-Critic for Multi-Agent Reinforcement Learning" ICML 2019
MIT License
645 stars 169 forks source link

About SAC implementation #22

Closed yesiam-png closed 4 years ago

yesiam-png commented 4 years ago

Hi, in your implementation, SAC is used but V is estimated by Q-function when updating critic and calculating target Q, instead of a separated value network in the original SAC paper. Would you please explain it or give some references? Thanks

shariqiqbal2810 commented 4 years ago

Hi,

Since we're using discrete action spaces our Q-function outputs a value for each possible action. As such, we can marginalize Q in order to get V instead of estimating V separately.