shariqiqbal2810 / MAAC

Code for "Actor-Attention-Critic for Multi-Agent Reinforcement Learning" ICML 2019
MIT License
645 stars 169 forks source link

State Action Encoding in Critic #5

Closed HassamSheikh closed 5 years ago

HassamSheikh commented 5 years ago

I was going through your code and I am having difficult time understanding 1 part of the critic. If you see line https://github.com/shariqiqbal2810/MAAC/blob/1006cffb61e6043872a27956635e199b96b910b2/utils/critics.py#L148

you are just using a state encoding along with the joint embedding of all state-action pair of other agents as an input to the Q-function. If I recall correctly equation 5 takes an embedding of the current agents state-action pair alongside with joint embedding. Can you please explain what is going on here?

shariqiqbal2810 commented 5 years ago

This is explained in section 3.2 in the paragraph titled "Multi-Agent Advantage Function." Essentially, we want to output a Q-value for each action, given the observation, so we don't take the action as input and instead have a separate output for each action.