openai / maddpg

Code for the MADDPG algorithm from the paper "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments"
https://arxiv.org/pdf/1706.02275.pdf
MIT License
1.59k stars 484 forks source link

Training EVERY step, not every 100 #67

Open eflopez1 opened 2 years ago

eflopez1 commented 2 years ago

Hello,

I wanted to verify something I found in your code. In the method MADDPGAgentTrainer.update() there is a comment next to the following line stating that an update is only allowed to occur every 100 steps: ​

if not t % 100 == 0:  # only update every 100 steps
   ​return

I could be misreading this, but doesn't this line mean that an update will occur every step but skip over steps when t_step%100==0?

Jelle-Plomp commented 2 years ago

t%100 ==0 is true for every 100th step.

Since we have "if not t % 100 ==0: return", this return statement will be executed for all steps except for every 100th step. Therefore only every 100th step, the rest of the update function will be evaluated (hence every 100 steps the update is performed).