openai / coinrun

Code for the paper "Quantifying Transfer in Reinforcement Learning"
https://blog.openai.com/quantifying-generalization-in-reinforcement-learning/
MIT License
388 stars 87 forks source link

High variance in mean_score during test #40

Closed KaiyangZhou closed 3 years ago

KaiyangZhou commented 4 years ago

Thanks for this code.

I found the variance in mean_score is quite high when I run the test code for multiple times (using the same trained model) with the same set of parameters (num_eval=20 and rep=5), e.g. I got mean_score=3.8, 5.2 & 4.6 for three runs, and sometimes mean_score>6.0 for the same model. Is this normal?

In addition, what values for num_eval and rep would you suggest in order to obtain a fair result for comparison between methods?

kcobbe commented 3 years ago

Using those values, you're only evaluating on 5*20=100 levels, so it's not very surprising that there'd be high variance. If you evaluate on 1k levels the variance will be substantially lower. 5k levels should give very low variance.