Is it possible to release the evaluation scores of the baseline agents?

First, thanks for this awesome codebase, it helped me a lot :) I have three questions

Based on #147 and the white paper, the results in the baseline folder are training returns instead of evaluation returns. Would it be possible to also release the evaluation returns, if they are available? The reason I'm asking is that I'm running some ensemble methods that behave very differently during training and evaluation. Even though the white paper shows that for the agents in the repo, using evaluation or training returns does not matter much for the 3 games tested, I'm not sure whether this is still the case for the other 57 games. And even if so, for apple-to-apple comparisons I would prefer to compare the evaluation returns.
Why are there only 199 iterations (indexed from 0 to 198) for the baseline results, given that we always run 200 iterations?
In the MICo paper, did you report training returns or evaluation returns?

Thanks!

google / dopamine