How many distinguished environments do you use in evaluation?

openai / coinrun

Code for the paper "Quantifying Transfer in Reinforcement Learning"

https://blog.openai.com/quantifying-generalization-in-reinforcement-learning/

MIT License

395 stars 88 forks source link

How many distinguished environments do you use in evaluation? #42

Closed pengzhenghao closed 4 years ago

pengzhenghao commented 4 years ago

Hi there! Thanks for this interesting environment. I am wondering how many environments you use during evaluation, because I can't found it in the paper as well as the code. It seems the number of environment (num_eval) is set to 20 but I am a little confused.

Could you please clarify how many distinguished environments is used during evaluation? Thanks!

kcobbe commented 4 years ago

For final evaluations, we report average returns across 10k episodes, where each episode uniformly samples a level from the appropriate distribution (train or test). In practice, this is probably more episodes than is necessary (I believe the std dev is already quite low after 1k episodes). Note that when evaluating training performance, different episodes may use the same level, since the training sets often have less than 10k levels.