Open yesiam-png opened 3 years ago
Hi Shariq, In your implementation and MAAC paper, you use expected discounted returns to learn the state-action Q function, e.g., Eq. (2) and (7), instead of the maximum Q(s, a) w.r.t action a. Could you explain it or give a reference? Best, Yesiam
Hi Shariq, In your implementation and MAAC paper, you use expected discounted returns to learn the state-action Q function, e.g., Eq. (2) and (7), instead of the maximum Q(s, a) w.r.t action a. Could you explain it or give a reference? Best, Yesiam