A2C hyperparameters for MuJoCo

hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

http://stable-baselines.readthedocs.io/

MIT License

4.14k stars 723 forks source link

A2C hyperparameters for MuJoCo #249

Closed rasoolfa closed 4 years ago

rasoolfa commented 5 years ago

Hi,

I was wondering do you happen to have A2C's hyperparams for MuJoCo that can reproduce results close/similar to the PPO paper [PPO paper, Figure 3, and results for A2C]? or any A2C hyperparameters that work for MuJoCo?

Thanks.

araffin commented 5 years ago

Hello,

Please wait a bit or use the gail-test branch (see PR #206 ), that will be merged with master soon. In the master branch, there is a tricky bug in A2C with continuous actions, but fortunately easy to fix (see https://github.com/hill-a/stable-baselines/pull/206/commits/689afd16f5b07d2fead1fa5e8474a8efa2826a64 for the fix)

For the hyperparameters, I would recommend you to take a look at the rl baselines zoo on the add-trpo branch. There are hyperparameters for Pybullet envs that are similar and a bit harder than the mujoco ones. From what I remember, default hyperparameters where working quite well for A2C.

EDIT: it seems that A2C needs some hyperparameter tuning for Mujoco (I'm currenltly running some) EDIT: the branch is now merged with master ;)

rasoolfa commented 5 years ago

Hi,

Thanks. That would be very helpful and great if you can share A2C hyperparameters when you have it. It seems A2C needs different hyperparameters for Mujoco than Atari. Thanks again for your help.

araffin commented 5 years ago

Hey, here is for now the best hyperparams found so far (using add-trpo branch in the rl baselines zoo) with stable-baselines v2.5.0 (please upgrade ;)):

HalfCheetahBulletEnv-v0:
  normalize: true
  n_envs: 8
  n_timesteps: !!float 2e6
  policy: 'MlpPolicy'
  ent_coef: 0.0
  n_steps: 32
  vf_coef: 0.5
  lr_schedule: 'linear'
  gamma: 0.99
  learning_rate: 0.0013

rasoolfa commented 5 years ago

Thanks a lot. Really appreciated for the update.

araffin commented 5 years ago

Thanks a lot. Really appreciated for the update.

Your welcome. Btw, as I did not have any Mujoco licence, I would be interested by your results ;)

araffin commented 4 years ago

I just published a paper with optimized parameters for A2C on pybullet environments:

the paper: Generalized State-Dependent Exploration for Deep Reinforcement Learning in Robotics
you can find the parameters here (it uses SB3): https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/hyperparams/a2c.yml#L126