Add benchmarking environments for discrete action space to README

rlworkgroup / garage

A toolkit for reproducible reinforcement learning research.

MIT License

1.86k stars 310 forks source link

Add benchmarking environments for discrete action space to README #912

Closed ahtsan closed 4 years ago

ahtsan commented 4 years ago

Please merge the README in https://gist.github.com/nish21/760cbdafcbb2838f7707e1edea6a1709 into master, so we have a single source of reference.

Also please add the set of benchmarking environments for discrete action space (both pixel/non-pixel envs) to the README. You can refer to https://github.com/rlworkgroup/garage/pull/906 for the set of environments suggested by @ryanjulian

ryanjulian commented 4 years ago

I agree -- perhaps the best place for those for now is in a README in tests/benchmarks? We can link to that file from a new section of CONTRIBUTING, describing how we test new algorithms/primitives.

ahtsan commented 4 years ago

I agree -- perhaps the best place for those for now is in a README in tests/benchmarks? We can link to that file from a new section of CONTRIBUTING, describing how we test new algorithms/primitives.

Absolutely

ryanjulian commented 4 years ago

I guess we need 4 benchmarks sets:

Observations/Actions	Discrete	Continuous
Pixel	Atari1M	???
State	???	MuJoCo1M

ahtsan commented 4 years ago

One debate is should we use the same set of pixel envs for on-policy/off-policy algorithms? Some of us tried running PPO against Atari environments and it doesn't work well. Probably we need much longer training time / hyperparameter tuning.

Maybe wrapping a set of easier discrete action space environments (Cartpole / Acrobot / MountainCar / LunarLander) is efficient for on-policy? And we keep Atari1M for off-policy like DQN?

ryanjulian commented 4 years ago

i don't think it's worth it to try on-policy + Atari -- see my response to Linda's PR for alternatives (including your suggestion)

ahtsan commented 4 years ago

i don't think it's worth it to try on-policy + Atari -- see my response to Linda's PR for alternatives (including your suggestion)

Yes, so we should have

Observations/Actions	Discrete	Continuous
On-policy & Pixel	A	\
Off-policy & Pixel	Atari1M	\
State	D	MuJoCo1M

A: [
'MemorizeDigits-v0',
'CubeCrash-v0',
'CarRacing-v0',
'Acrobot-v1'^,
'MountainCar-v0'^,
'CartPole-v1'^,
'LunarLander-v2'^
]

^ Using the wrappers PixelObservationWrapper and FrameStack (n=4)

D: [
'LunarLander-v2',
'CartPole-v1',
'Assault-ramDeterministic-v4',
'Breakout-ramDeterministic-v4',
'ChopperCommand-ramDeterministic-v4',
'Tutankham-ramDeterministic-v4'
]

ahtsan commented 4 years ago

close by #1271