openai / procgen

Procgen Benchmark: Procedurally-Generated Game-Like Gym-Environments
https://openai.com/blog/procgen-benchmark/
MIT License
991 stars 207 forks source link

Suggestion: Include the possible environment score ranges in the main descriptions #79

Open nhansendev opened 2 years ago

nhansendev commented 2 years ago

I found the table with the range of possible scores in the appendix of the paper and thought it could be a useful reference to include in a more visible location, such as on the main github page alongside the environment descriptions:

" C. Normalization Constants Rmin is computed by training a policy with masked out observations. This demonstrates what score is trivially achievable in each environment. Rmax is computed in several different ways. For CoinRun, Dodgeball, Miner, Jumper, Leaper, Maze, BigFish, Heist, Plunder, Ninja, and Bossfight, the maximal theoretical and practical reward is trivial to compute.

For CaveFlyer, Chaser, and Climber, we empirically determine Rmax by generating many levels and computing the average max achievable reward.

For StarPilot and FruitBot, the max practical reward is not obvious, even though it is easy to establish a theoretical bound. We choose to define Rmax in these environments as the score PPO achieves after 8 billion timesteps when trained at an 8x larger batch size than our default hyperparameters. On observing these policies, we find them very close to optimal. " scores

What do you think?