qgallouedec / panda-gym

Set of robotic environments based on PyBullet physics engine and gymnasium.
MIT License
492 stars 106 forks source link

Hyperparameters for PickAndPlace #65

Closed shukla-yash closed 1 year ago

shukla-yash commented 1 year ago

Hi,

I am unable to learn a policy for the PandaPickAndPlace task using RL Zoo. I am trying to get the results shared in the experimental results section of the Panda-gym paper. Here are my hyperparameters for the SAC, DDPG and the TQC algo:

PandaPush-v2: &her-defaults
  env_wrapper: sb3_contrib.common.wrappers.TimeFeatureWrapper
  n_timesteps: !!float 1e6
  policy: 'MultiInputPolicy'
  buffer_size: 1000000
  batch_size: 2048
  gamma: 0.95
  learning_rate: !!float 1e-3
  tau: 0.05
  replay_buffer_class: HerReplayBuffer
  replay_buffer_kwargs: "dict(
    online_sampling=True,
    goal_selection_strategy='future',
    n_sampled_goal=4,
  )"
  policy_kwargs: "dict(net_arch=[512, 512, 512], n_critics=2)"

PandaPickAndPlace-v2:
  <<: *her-defaults
  learning_rate: !!float 2e-4

Can you please help me with the hyperparams that you used for your experiments?

qgallouedec commented 1 year ago

The results presented in the paper were obtained with baselines. To reproduce them strictly, you can use https://github.com/qgallouedec/drl-grasping. However, the code is old and is not maintained anymore. So I strongly advise you to use rl-baselines3-zoo instead. The results are part of openrlbenchmark and are very easy to reproduce. See https://wandb.ai/openrlbenchmark/sb3.