qgallouedec / panda-gym

Set of robotic environments based on PyBullet physics engine and gymnasium.
MIT License
492 stars 106 forks source link

Benchmarking results and hyperparameters #53

Closed zanellar closed 1 year ago

zanellar commented 1 year ago

Hi, can you provide some benchmarking results with the correspondent algorithm and hyperparameters for the 4 tasks? I've tried SAC, PPO and DDPG but couldn't train an agent for reaching good results (I'm focusing on PandaPickAndPlace and PandaPush)

qgallouedec commented 1 year ago

Hi, The reward being sparse it is normal that you don't get good results if you don't use a structured exploration strategy (like HER). I think that the only task that can be learned without any exploration strategy is PandaReach.

Having said that, I draw your attention to the fact that HER can't work with PPO because it is an on-policy algorithm.

You can find good hyperparameters on RL baselines3 Zoo, but only for the TQC algorithm for the moment. By the way, we are currently building an open benchmark for many algorithms and environments, including panda-gym, see openrlbenchmark.

I plan to extend this benchmark on panda-gym to other algorithms than TQC, but it's not my priority for the moment. If you find good hyperparam for SAC and DDPG, please share them.

jonasreiher commented 1 year ago

I got the SAC implementation from stable-baselines3 working on PandaPush-v3 with both sparse and dense reward: I used 10 environments in parallel with max_episode_steps=100 for both sparse and dense reward. SAC settings: sb3 defaults + learning_starts=100000, gradient_steps=-1, steps=3000000 (nothing fancy, just letting it run for a while)

Will add these to RL baselines3 Zoo after more hyper parameter tuning.

qgallouedec commented 1 year ago

This is great news! Do you use SAC with HER? Can you anticipate the integration to openrlbenchmark by tracking your experiments with wandb? The instructions are here: https://github.com/openrlbenchmark/openrlbenchmark/issues/7 For now, track them in a personal project and we'll move them to openrlbenchmark afterwards

jonasreiher commented 1 year ago

For now, I just used the regular experience replay buffer, no HER, might have to try this. Thanks for the openrlbenchmark hint, I will look into this! Though my current research focusses on vision-based RL, so I'm not really using the default observation/state representation.

Bargez908 commented 1 year ago

You can find good hyperparameters on RL baselines3 Zoo, but only for the TQC algorithm for the moment. By the way, we are currently building an open benchmark for many algorithms and environments, including panda-gym, see openrlbenchmark.

Did you solve the panda push problem with the action space defined as 3 values (the end effector position) or using the 7 joints actions? Regarding the joints based action space, is it a position, velocity or torque control?

qgallouedec commented 1 year ago

Did you solve the panda push problem with the action space defined as 3 values (the end effector position) or using the 7 joints actions?

We used PandaPush, where the observation and the control is related to the end-effector.

Regarding the joints based action space, is it a position, velocity or torque control?

The action is a target displacement. First, the raw action is scaled by 0.05. The result is added to the current joint position to obtain the target angles of the joints. Then, PyBullet uses a PD controller to compute the torque applied on each joint. Thus, we can think of the action as a virtual force applied on the joints.

Related: https://github.com/qgallouedec/panda-gym/issues/37#issuecomment-1284237047

qgallouedec commented 1 year ago

I now consider this question as solved as openrlbenchmark is in its first version. You can see all results and hyperparameters here: https://wandb.ai/openrlbenchmark/sb3

zichunxx commented 7 months ago

@qgallouedec Hi! It seems that all PickAndPlace tasks are finished successfully with TQC in https://wandb.ai/openrlbenchmark/sb3. There are no recommended hyperparameters for DDPG and SAC.