shariqiqbal2810 / MAAC

Code for "Actor-Attention-Critic for Multi-Agent Reinforcement Learning" ICML 2019
MIT License
645 stars 169 forks source link

Critic function learning #34

Open yesiam-png opened 3 years ago

yesiam-png commented 3 years ago

Hi Shariq, In your implementation and MAAC paper, you use expected discounted returns to learn the state-action Q function, e.g., Eq. (2) and (7), instead of the maximum Q(s, a) w.r.t action a. Could you explain it or give a reference? Best, Yesiam