hijkzzz / pymarl2

Fine-tuned MARL algorithms on SMAC (100% win rates on most scenarios)
https://iclr-blogposts.github.io/2023/blog/2023/riit/
Apache License 2.0
632 stars 124 forks source link

策略迭代问题 #24

Closed wubmu closed 2 years ago

wubmu commented 2 years ago

你好,我在你们的文章中看到S=EPI。 where S is the total number of samples, E is the number of samples in each episode, P is the number of rollout processes, and I is the number of policy iterations. 这里的policy iterations是指的target_update_interval还是多少轮train一次

hijkzzz commented 2 years ago

你好,这里的意思是多少轮train一次~总之就是采样步数和训练次数的比例很重要。

你好,我在你们的文章中看到S=E_P_I。 where S is the total number of samples, E is the number of samples in each episode, P is the number of rollout processes, and I is the number of policy iterations. 这里的policy iterations是指的target_update_interval还是多少轮train一次