Closed TimeBreaker closed 5 years ago
return_mean is the total reward for an episode averaged across the different environments that were run at that time. return_std is then just the standard deviation of the episode's returns across the same environments.
The agents are RNN agents by default, they are located here.
All the training code for a Q-Learning agent is located here.
QMIX uses whatever agents you specify (default is an RNN agent).
Hi, thanks for this repo! I have been reading the source code pf pymarl and I have a few questions.
In the output of the program, there are a few parameters like
return_mean
. I understand most of them but I have trouble understandingreturn_mean
andreturn_std
. What's the meaning ofreturn
? (I guess may be the calculation of value function.) And how do you calculatereturn
along withreturn_mean
andreturn_std
?The other question is "why do we use rnn agent?". When I search the word
rnn
in this repo, I didn't find codes about how the rnn is used in training agents. And when we use algorithms like qmix, is the system still using rnn agent or the system use qmix agent(like overwriting rnn agent).Thanks again for this repo!