Hyperparameters for PickAndPlace

Hi,

I am unable to learn a policy for the PandaPickAndPlace task using RL Zoo. I am trying to get the results shared in the experimental results section of the Panda-gym paper. Here are my hyperparameters for the SAC, DDPG and the TQC algo:

PandaPush-v2: &her-defaults
  env_wrapper: sb3_contrib.common.wrappers.TimeFeatureWrapper
  n_timesteps: !!float 1e6
  policy: 'MultiInputPolicy'
  buffer_size: 1000000
  batch_size: 2048
  gamma: 0.95
  learning_rate: !!float 1e-3
  tau: 0.05
  replay_buffer_class: HerReplayBuffer
  replay_buffer_kwargs: "dict(
    online_sampling=True,
    goal_selection_strategy='future',
    n_sampled_goal=4,
  )"
  policy_kwargs: "dict(net_arch=[512, 512, 512], n_critics=2)"

PandaPickAndPlace-v2:
  <<: *her-defaults
  learning_rate: !!float 2e-4

Can you please help me with the hyperparams that you used for your experiments?

qgallouedec / panda-gym

Hyperparameters for PickAndPlace #65