Dense Rewards added for PullCube & LiftPegUpright Envs
PullCube dense reward is mirror of PushCube, difference is reaching reward on opposite side of cube
LiftPegUpright dense reward is sum of rotation reward (implemented as cosine similarity of unit vector from peg center of mass toward end of peg and it's goal orientation -see implementation comments for details) + center of mass distance reward + reaching/gripping reward
bash command to re-create experiments with PPO below:
PullCube:
for i in {1..3}; do python ppo.py --env_id="PullCube-v1" --exp-name="pullcubeseed${i}" --num_envs=2048 \
--update_epochs=8 --num_minibatches=32 --total_timesteps=4_000_000 --eval_freq=10 --num-steps=20 \
--seed=${i} --gamma=0.8; done
LiftPegUpright:
for i in {1..3}; do python ppo.py --env_id="LiftPegUpright-v1" --exp-name="liftpegseed${i}" --num_envs=2048 \
--update_epochs=8 --num_minibatches=32 --total_timesteps=9_000_000 --eval_freq=10 --num-steps=20 \
--seed=${i} --gamma=0.9; done
Comparison between pullcube and pushcube:
pushcube ran as default, with more steps to overshoot convergence
pushcube:
for i in {1..3}; do python ppo.py --env_id="PushCube-v1" --exp-name="pushcubeseed${i}" --num_envs=2048 \
--update_epochs=8 --num_minibatches=32 --total_timesteps=4_000_000 --eval_freq=10 --num-steps=20 \
--seed=${i} --gamma=0.8; done
Dense Rewards added for PullCube & LiftPegUpright Envs
PullCube dense reward is mirror of PushCube, difference is reaching reward on opposite side of cube
LiftPegUpright dense reward is sum of rotation reward (implemented as cosine similarity of unit vector from peg center of mass toward end of peg and it's goal orientation -see implementation comments for details) + center of mass distance reward + reaching/gripping reward
bash command to re-create experiments with PPO below: PullCube: for i in {1..3}; do python ppo.py --env_id="PullCube-v1" --exp-name="pullcubeseed${i}" --num_envs=2048 \ --update_epochs=8 --num_minibatches=32 --total_timesteps=4_000_000 --eval_freq=10 --num-steps=20 \ --seed=${i} --gamma=0.8; done
LiftPegUpright: for i in {1..3}; do python ppo.py --env_id="LiftPegUpright-v1" --exp-name="liftpegseed${i}" --num_envs=2048 \ --update_epochs=8 --num_minibatches=32 --total_timesteps=9_000_000 --eval_freq=10 --num-steps=20 \ --seed=${i} --gamma=0.9; done
Comparison between pullcube and pushcube:
pushcube ran as default, with more steps to overshoot convergence
pushcube: for i in {1..3}; do python ppo.py --env_id="PushCube-v1" --exp-name="pushcubeseed${i}" --num_envs=2048 \ --update_epochs=8 --num_minibatches=32 --total_timesteps=4_000_000 --eval_freq=10 --num-steps=20 \ --seed=${i} --gamma=0.8; done
https://github.com/haosulab/ManiSkill/assets/115660089/46561965-20f6-4cf0-b0af-0bd54bcf9391
https://github.com/haosulab/ManiSkill/assets/115660089/49538553-2686-46d6-829a-d34d764eadb8