haosulab / ManiSkill

SAPIEN Manipulation Skill Framework, a GPU parallelized robotics simulator and benchmark
https://maniskill.ai/
Apache License 2.0
766 stars 140 forks source link

Dense Rews PullCube & LiftPegUpright #403

Closed Xander-Hinrichsen closed 3 months ago

Xander-Hinrichsen commented 3 months ago

Dense Rewards added for PullCube & LiftPegUpright Envs

PullCube dense reward is mirror of PushCube, difference is reaching reward on opposite side of cube

LiftPegUpright dense reward is sum of rotation reward (implemented as cosine similarity of unit vector from peg center of mass toward end of peg and it's goal orientation -see implementation comments for details) + center of mass distance reward + reaching/gripping reward

bash command to re-create experiments with PPO below: PullCube: for i in {1..3}; do python ppo.py --env_id="PullCube-v1" --exp-name="pullcubeseed${i}" --num_envs=2048 \ --update_epochs=8 --num_minibatches=32 --total_timesteps=4_000_000 --eval_freq=10 --num-steps=20 \ --seed=${i} --gamma=0.8; done

LiftPegUpright: for i in {1..3}; do python ppo.py --env_id="LiftPegUpright-v1" --exp-name="liftpegseed${i}" --num_envs=2048 \ --update_epochs=8 --num_minibatches=32 --total_timesteps=9_000_000 --eval_freq=10 --num-steps=20 \ --seed=${i} --gamma=0.9; done

image image

Comparison between pullcube and pushcube:

pushcube ran as default, with more steps to overshoot convergence

pushcube: for i in {1..3}; do python ppo.py --env_id="PushCube-v1" --exp-name="pushcubeseed${i}" --num_envs=2048 \ --update_epochs=8 --num_minibatches=32 --total_timesteps=4_000_000 --eval_freq=10 --num-steps=20 \ --seed=${i} --gamma=0.8; done

image

https://github.com/haosulab/ManiSkill/assets/115660089/46561965-20f6-4cf0-b0af-0bd54bcf9391

https://github.com/haosulab/ManiSkill/assets/115660089/49538553-2686-46d6-829a-d34d764eadb8