Dense Rews PullCube & LiftPegUpright

Dense Rewards added for PullCube & LiftPegUpright Envs

PullCube dense reward is mirror of PushCube, difference is reaching reward on opposite side of cube

LiftPegUpright dense reward is sum of rotation reward (implemented as cosine similarity of unit vector from peg center of mass toward end of peg and it's goal orientation -see implementation comments for details) + center of mass distance reward + reaching/gripping reward

bash command to re-create experiments with PPO below: PullCube: for i in {1..3}; do python ppo.py --env_id="PullCube-v1" --exp-name="pullcubeseed${i}" --num_envs=2048 \ --update_epochs=8 --num_minibatches=32 --total_timesteps=4_000_000 --eval_freq=10 --num-steps=20 \ --seed=${i} --gamma=0.8; done

LiftPegUpright: for i in {1..3}; do python ppo.py --env_id="LiftPegUpright-v1" --exp-name="liftpegseed${i}" --num_envs=2048 \ --update_epochs=8 --num_minibatches=32 --total_timesteps=9_000_000 --eval_freq=10 --num-steps=20 \ --seed=${i} --gamma=0.9; done

Comparison between pullcube and pushcube:

pushcube ran as default, with more steps to overshoot convergence

pushcube: for i in {1..3}; do python ppo.py --env_id="PushCube-v1" --exp-name="pushcubeseed${i}" --num_envs=2048 \ --update_epochs=8 --num_minibatches=32 --total_timesteps=4_000_000 --eval_freq=10 --num-steps=20 \ --seed=${i} --gamma=0.8; done

https://github.com/haosulab/ManiSkill/assets/115660089/46561965-20f6-4cf0-b0af-0bd54bcf9391

https://github.com/haosulab/ManiSkill/assets/115660089/49538553-2686-46d6-829a-d34d764eadb8

haosulab / ManiSkill

Dense Rews PullCube & LiftPegUpright #403