In this implementation agents' network is a recurrent network, and this should allow for agents to remember necessary previous actions and observations. But the multi-agent controller also appends to agents' inputs their previous action.
What is the reason for that, and is it really necessary?
If you are using the SMAC benchmark, there is a special flag for this called obs_last_action. See SMAC code for more details on all flags that specify the observation function.
Hello,
In this implementation agents' network is a recurrent network, and this should allow for agents to remember necessary previous actions and observations. But the multi-agent controller also appends to agents' inputs their previous action.
What is the reason for that, and is it really necessary?
Thanks!