Closed shukla-yash closed 2 years ago
Hi,
What do you mean by "does not work" ? Is an error raised during code execution ? Or do you mean that the results that you obtain does not match with the curve on the paper ?
The code I used for the paper is provided in this openai/baselines fork. It will allow you to reproduce strictly the results of the paper.
Nevertheless, since openai has stopped maintaining its repo, I strongly advise you to use RL code maintained like stable-baselines3, even if you will probably not be able to reproduce exactly the results of the paper.
If you still want to use OpenAI/baselines to reproduce strictly my results, please note that I used the v0 version of panda-gym (and not the v2 version I released in the meantime). The changes between these two versions won't change the curves much I think, but I can't guarantee it.
Thanks for your reply. By does not work, I meant the learning curves did not match (No error in execution). I trained for almost 3x10^6 timesteps, but the success rate for PandaPush-v2 was stuck at 0.15 (The learning curves in the paper converge to a success rate ~ 1).
Thanks for your suggestions, I will try them in the meantime. Did you use sparse reward for the curves?
Did you use sparse reward for the curves?
I did.
You can also check the baselines results on the rl-baselines3-zoo repo. For Push, convergence occurs well before 1e6 timesteps.
Can you please post a snippet for PandaPickAndPlace-v2 that learns using DDPG from SB3, to reproduce the results in the paper? I realize it might not be exactly equivalent with the results from the paper, but anything that learns should work for me
I've tried this, but it does not work:
`env = gym.make("PandaPickAndPlace-v2") env = make_vec_env(lambda: env, n_envs=4)
model = DDPG(policy="MultiInputPolicy", env=env, replay_buffer_class=HerReplayBuffer, verbose=1, batch_size= 2048, buffer_size=1000000)
model.learn(total_timesteps=4000000)`
Thanks!
You can use rl-baselines3-zoo to train PandaPush-v2. You just need to paste these hyperparameters in hyperparams/ddpg.yml
:
PandaPush-v2:
env_wrapper: sb3_contrib.common.wrappers.TimeFeatureWrapper
n_timesteps: !!float 1e6
policy: 'MultiInputPolicy'
buffer_size: 1000000
batch_size: 2048
gamma: 0.95
learning_rate: !!float 1e-3
noise_type: 'normal'
noise_std: 0.1
replay_buffer_class: HerReplayBuffer
replay_buffer_kwargs: "dict(
online_sampling=True,
goal_selection_strategy='future',
n_sampled_goal=4,
)"
policy_kwargs: "dict(net_arch=[512, 512, 512], n_critics=2)"
then run
python train.py --algo ddpg --env PandaPush-v2
Here is the result you will get:
I should also converge with PandaPickandPlace-v2
. Feel free to open a PR in the zoo like this one to share you results.
Hi,
I am trying to recreate your results from the paper 'panda-gym: Open-source goal-conditioned environments for robotic learning', and the code given in train_push.py does not seem to work with the default parameters. Can you point me to the RL code you used to get those results? Also, are the learning curves in the paper from Sparse reward setting or Dense? Thanks!