rlworkgroup / garage

A toolkit for reproducible reinforcement learning research.
MIT License
1.86k stars 310 forks source link

Add benchmarking environments for discrete action space to README #912

Closed ahtsan closed 4 years ago

ahtsan commented 4 years ago

Please merge the README in https://gist.github.com/nish21/760cbdafcbb2838f7707e1edea6a1709 into master, so we have a single source of reference.

Also please add the set of benchmarking environments for discrete action space (both pixel/non-pixel envs) to the README. You can refer to https://github.com/rlworkgroup/garage/pull/906 for the set of environments suggested by @ryanjulian

ryanjulian commented 4 years ago

I agree -- perhaps the best place for those for now is in a README in tests/benchmarks? We can link to that file from a new section of CONTRIBUTING, describing how we test new algorithms/primitives.

ahtsan commented 4 years ago

I agree -- perhaps the best place for those for now is in a README in tests/benchmarks? We can link to that file from a new section of CONTRIBUTING, describing how we test new algorithms/primitives.

Absolutely

ryanjulian commented 4 years ago

I guess we need 4 benchmarks sets:

Observations/Actions Discrete Continuous
Pixel Atari1M ???
State ??? MuJoCo1M
ahtsan commented 4 years ago

One debate is should we use the same set of pixel envs for on-policy/off-policy algorithms? Some of us tried running PPO against Atari environments and it doesn't work well. Probably we need much longer training time / hyperparameter tuning.

Maybe wrapping a set of easier discrete action space environments (Cartpole / Acrobot / MountainCar / LunarLander) is efficient for on-policy? And we keep Atari1M for off-policy like DQN?

ryanjulian commented 4 years ago

i don't think it's worth it to try on-policy + Atari -- see my response to Linda's PR for alternatives (including your suggestion)

ahtsan commented 4 years ago

i don't think it's worth it to try on-policy + Atari -- see my response to Linda's PR for alternatives (including your suggestion)

Yes, so we should have

Observations/Actions Discrete Continuous
On-policy & Pixel A \
Off-policy & Pixel Atari1M \
State D MuJoCo1M
A: [
'MemorizeDigits-v0',
'CubeCrash-v0',
'CarRacing-v0',
'Acrobot-v1'^,
'MountainCar-v0'^,
'CartPole-v1'^,
'LunarLander-v2'^
]

^ Using the wrappers PixelObservationWrapper and FrameStack (n=4)

D: [
'LunarLander-v2',
'CartPole-v1',
'Assault-ramDeterministic-v4',
'Breakout-ramDeterministic-v4',
'ChopperCommand-ramDeterministic-v4',
'Tutankham-ramDeterministic-v4'
]
ahtsan commented 4 years ago

close by #1271