Closed ahtsan closed 4 years ago
I agree -- perhaps the best place for those for now is in a README in tests/benchmarks? We can link to that file from a new section of CONTRIBUTING, describing how we test new algorithms/primitives.
I agree -- perhaps the best place for those for now is in a README in tests/benchmarks? We can link to that file from a new section of CONTRIBUTING, describing how we test new algorithms/primitives.
Absolutely
I guess we need 4 benchmarks sets:
Observations/Actions | Discrete | Continuous |
---|---|---|
Pixel | Atari1M | ??? |
State | ??? | MuJoCo1M |
One debate is should we use the same set of pixel envs for on-policy/off-policy algorithms? Some of us tried running PPO against Atari environments and it doesn't work well. Probably we need much longer training time / hyperparameter tuning.
Maybe wrapping a set of easier discrete action space environments (Cartpole / Acrobot / MountainCar / LunarLander) is efficient for on-policy? And we keep Atari1M for off-policy like DQN?
i don't think it's worth it to try on-policy + Atari -- see my response to Linda's PR for alternatives (including your suggestion)
i don't think it's worth it to try on-policy + Atari -- see my response to Linda's PR for alternatives (including your suggestion)
Yes, so we should have
Observations/Actions | Discrete | Continuous |
---|---|---|
On-policy & Pixel | A | \ |
Off-policy & Pixel | Atari1M | \ |
State | D | MuJoCo1M |
A: [
'MemorizeDigits-v0',
'CubeCrash-v0',
'CarRacing-v0',
'Acrobot-v1'^,
'MountainCar-v0'^,
'CartPole-v1'^,
'LunarLander-v2'^
]
^ Using the wrappers PixelObservationWrapper and FrameStack (n=4)
D: [
'LunarLander-v2',
'CartPole-v1',
'Assault-ramDeterministic-v4',
'Breakout-ramDeterministic-v4',
'ChopperCommand-ramDeterministic-v4',
'Tutankham-ramDeterministic-v4'
]
close by #1271
Please merge the README in https://gist.github.com/nish21/760cbdafcbb2838f7707e1edea6a1709 into master, so we have a single source of reference.
Also please add the set of benchmarking environments for discrete action space (both pixel/non-pixel envs) to the README. You can refer to https://github.com/rlworkgroup/garage/pull/906 for the set of environments suggested by @ryanjulian