starry-sky6688 / MARL-Algorithms

Implementations of IQL, QMIX, VDN, COMA, QTRAN, MAVEN, CommNet, DyMA-CL, and G2ANet on SMAC, the decentralised micromanagement scenario of StarCraft II
1.46k stars 283 forks source link

Questions about evaluation interval #121

Closed ShengjieSun419 closed 3 days ago

ShengjieSun419 commented 4 days ago

First of all, thank you for your excellent code.

Regarding the evaluation of the qmix, I have a question: Since the length of the episode of the sc2 environment is not fixed, the increase of time_steps after each sampling is unpredictable. As shown in the figure, the number of steps each time is likely to be different. image

This means that as long as the change in time_steps is greater than args.evaluate_cycle, an evaluation can be performed. However, the time_steps when each evaluation is not necessarily an integer multiple of args.evaluate_cycle, as shown in figure. image

So when drawing, does it mean that using an integer multiple of evaluate_cycle on the horizontal axis will introduce bias? image

starry-sky6688 commented 3 days ago

After each evaluation, evaluate_steps will be increased by one, and because time_steps is not a multiple of args.evaluate_cycle, we use // and > in this line

ShengjieSun419 commented 3 days ago

Hey, thanks for your reply. Actually, my problem is the learning curve. The code here specifies that the horizontal axis interval is args.evaluate_cycle when plotting. Strictly speaking, this is not correct, right? Because the evaluation interval is not strictly equal to args.evaluate_cycle. Moreover, in extreme cases, it may occur that the interval between two evaluations is greater than 2*args.evaluate_cycle.

ShengjieSun419 commented 2 days ago

Hey, thanks for your reply. Actually, my problem is the learning curve. The code here specifies that the horizontal axis interval is args.evaluate_cycle when plotting. Strictly speaking, this is not correct, right? Because the evaluation interval is not strictly equal to args.evaluate_cycle. Moreover, in extreme cases, it may occur that the interval between two evaluations is greater than 2*args.evaluate_cycle.

Is there a way to avoid this problem and evaluate strictly at intervals of args.evaluate_cycle steps? This is actually important for fair comparisons when plotting. Looking forward to your reply.