openai / maddpg

Code for the MADDPG algorithm from the paper "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments"
https://arxiv.org/pdf/1706.02275.pdf
MIT License
1.59k stars 484 forks source link

question about p_reg in p_train #46

Open yeshenpy opened 4 years ago

yeshenpy commented 4 years ago

I went through the code and found a problem I didn't understand. image I think of p_reg as a regular term, and the regular term as a constraint on the learning parameters. But I found that the act_Pd. flatparam() in the code p_reg = TF.reduce_mean (TF.square (act_pd.flatparam())) gets the network output, that is to say, the return of the flatparam function is not the learning parameters,Instead , It's network output How to explain this regularization.This confuses me and I look forward to your advice. for example of act_Pd. flatparam() : image