Closed HassamSheikh closed 5 years ago
This is explained in section 3.2 in the paragraph titled "Multi-Agent Advantage Function." Essentially, we want to output a Q-value for each action, given the observation, so we don't take the action as input and instead have a separate output for each action.
I was going through your code and I am having difficult time understanding 1 part of the critic. If you see line https://github.com/shariqiqbal2810/MAAC/blob/1006cffb61e6043872a27956635e199b96b910b2/utils/critics.py#L148
you are just using a state encoding along with the joint embedding of all state-action pair of other agents as an input to the Q-function. If I recall correctly equation 5 takes an embedding of the current agents state-action pair alongside with joint embedding. Can you please explain what is going on here?