Closed ShengjieSun419 closed 3 days ago
After each evaluation, evaluate_steps
will be increased by one, and because time_steps
is not a multiple of args.evaluate_cycle
, we use //
and >
in this line
Hey, thanks for your reply.
Actually, my problem is the learning curve. The code here specifies that the horizontal axis interval is args.evaluate_cycle
when plotting. Strictly speaking, this is not correct, right? Because the evaluation interval is not strictly equal to args.evaluate_cycle
. Moreover, in extreme cases, it may occur that the interval between two evaluations is greater than 2*args.evaluate_cycle
.
Hey, thanks for your reply. Actually, my problem is the learning curve. The code here specifies that the horizontal axis interval is
args.evaluate_cycle
when plotting. Strictly speaking, this is not correct, right? Because the evaluation interval is not strictly equal toargs.evaluate_cycle
. Moreover, in extreme cases, it may occur that the interval between two evaluations is greater than2*args.evaluate_cycle
.
Is there a way to avoid this problem and evaluate strictly at intervals of args.evaluate_cycle
steps? This is actually important for fair comparisons when plotting. Looking forward to your reply.
First of all, thank you for your excellent code.
Regarding the evaluation of the qmix, I have a question: Since the length of the episode of the sc2 environment is not fixed, the increase of
time_steps
after each sampling is unpredictable. As shown in the figure, the number ofsteps
each time is likely to be different.This means that as long as the change in
time_steps
is greater thanargs.evaluate_cycle
, an evaluation can be performed. However, thetime_steps
when each evaluation is not necessarily an integer multiple ofargs.evaluate_cycle
, as shown in figure.So when drawing, does it mean that using an integer multiple of
evaluate_cycle
on the horizontal axis will introduce bias?