vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
5.02k stars 575 forks source link

Implement Gymnasium-compliant PPO script #320

Closed dtch1997 closed 1 year ago

dtch1997 commented 1 year ago

Description

Types of changes

Checklist:

If you are adding new algorithm variants or your change could result in performance difference, you may need to (re-)run tracked experiments. See https://github.com/vwxyzjn/cleanrl/pull/137 as an example PR.

vercel[bot] commented 1 year ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
cleanrl ✅ Ready (Inspect) Visit Preview Dec 12, 2022 at 8:55PM (UTC)
vwxyzjn commented 1 year ago

CI passed. @dtch1997 would you mind running the first round of benchmark? Don't worry about capturing videos yet because of upstream issues.

export WANDB_ENTITY=openrlbenchmark
poetry install --with mujoco
OMP_NUM_THREADS=1 xvfb-run -a python -m cleanrl_utils.benchmark \
    --env-ids HalfCheetah-v4 Walker2d-v4 Hopper-v4 InvertedPendulum-v4 Humanoid-v4 Pusher-v4 \
    --command "poetry run python cleanrl/gymnasium_support/ppo_continuous_action.py --cuda False --track --capture-video" \
    --num-seeds 3 \
    --workers 1
dtch1997 commented 1 year ago

Benchmark in progress: https://wandb.ai/openrlbenchmark/cleanrl?workspace=user-dtch1997

vwxyzjn commented 1 year ago

Great thank you!

vwxyzjn commented 1 year ago

Executing the following command in https://github.com/vwxyzjn/ppo-atari-metrics

python rlops.py --wandb-project-name cleanrl \
    --wandb-entity openrlbenchmark \
    --filters 'ppo_continuous_action?tag=rlops-pilot' 'ppo_continuous_action?tag=pr-320'   \
    --env-ids HalfCheetah-v4 Walker2d-v4 Hopper-v4 InvertedPendulum-v4 Humanoid-v4 Pusher-v4 \
    --output-filename compare.png --scan-history

generates

image
ppo_continuous_action ({'tag': ['rlops-pilot']}) ppo_continuous_action ({'tag': ['pr-320']})
HalfCheetah-v4 1795.55 ± 819.96 2241.90 ± 1150.61
Walker2d-v4 2983.19 ± 757.43 3577.82 ± 315.46
Hopper-v4 2279.97 ± 450.53 2111.14 ± 335.94
InvertedPendulum-v4 890.99 ± 48.93 950.98 ± 36.39
Humanoid-v4 671.07 ± 83.75 728.82 ± 62.35
Pusher-v4 -51.27 ± 9.02 -49.51 ± 3.96
vwxyzjn commented 1 year ago

Thank you @dtch1997, would you be interested in helping run some dm_control experiments? Please pull the latest code and run

export WANDB_ENTITY=openrlbenchmark
poetry install --with dm_control,mujoco
OMP_NUM_THREADS=1 xvfb-run -a poetry run python -m cleanrl_utils.benchmark \
    --env-ids dm_control/acrobot-swingup-v0 dm_control/acrobot-swingup_sparse-v0 dm_control/ball_in_cup-catch-v0 dm_control/cartpole-balance-v0 dm_control/cartpole-balance_sparse-v0 dm_control/cartpole-swingup-v0 dm_control/cartpole-swingup_sparse-v0 dm_control/cartpole-two_poles-v0 dm_control/cartpole-three_poles-v0 dm_control/cheetah-run-v0 dm_control/dog-stand-v0 dm_control/dog-walk-v0 dm_control/dog-trot-v0 dm_control/dog-run-v0 dm_control/dog-fetch-v0 dm_control/finger-spin-v0 dm_control/finger-turn_easy-v0 dm_control/finger-turn_hard-v0 dm_control/fish-upright-v0 dm_control/fish-swim-v0 dm_control/hopper-stand-v0 dm_control/hopper-hop-v0 dm_control/humanoid-stand-v0 dm_control/humanoid-walk-v0 dm_control/humanoid-run-v0 dm_control/humanoid-run_pure_state-v0 dm_control/humanoid_CMU-stand-v0 dm_control/humanoid_CMU-run-v0 dm_control/lqr-lqr_2_1-v0 dm_control/lqr-lqr_6_2-v0 dm_control/manipulator-bring_ball-v0 dm_control/manipulator-bring_peg-v0 dm_control/manipulator-insert_ball-v0 dm_control/manipulator-insert_peg-v0 dm_control/pendulum-swingup-v0 dm_control/point_mass-easy-v0 dm_control/point_mass-hard-v0 dm_control/quadruped-walk-v0 dm_control/quadruped-run-v0 dm_control/quadruped-escape-v0 dm_control/quadruped-fetch-v0 dm_control/reacher-easy-v0 dm_control/reacher-hard-v0 dm_control/stacker-stack_2-v0 dm_control/stacker-stack_4-v0 dm_control/swimmer-swimmer6-v0 dm_control/swimmer-swimmer15-v0 dm_control/walker-stand-v0 dm_control/walker-walk-v0 dm_control/walker-run-v0 \
    --command "poetry run python cleanrl/gymnasium_support/ppo_continuous_action.py --cuda False --track" \
    --num-seeds 3 \
    --workers 9
nidhishs commented 1 year ago

Hey @dtch1997, I tried running the ppo_continous_actions.py file with --num_envs=4 however done = terminated or truncated no longer works due to terminated and truncated being Numpy arrays. I believe numpy.logical_or should fix it.

dtch1997 commented 1 year ago

@nidhishs The num_envs issue should be fixed now. @vwxyzjn to get the code snippet to run, I had to slightly modify the pyproject.toml to enable automatic installation of the right torch version for the installed CUDA driver. Taken from here: https://github.com/python-poetry/poetry/issues/4231#issuecomment-1182766775

dtch1997 commented 1 year ago

Benchmark ongoing: https://wandb.ai/openrlbenchmark/cleanrl/runs/2tigs6f1

vwxyzjn commented 1 year ago

@dtch1997 thanks a lot! Would you mind helping run the dmcontrol experiments? (https://github.com/vwxyzjn/cleanrl/pull/320#issuecomment-1322280088)

dtch1997 commented 1 year ago

@vwxyzjn I ran those last week, did the results show up here? https://wandb.ai/openrlbenchmark/cleanrl?workspace=user-dtch1997

Happy to re-run if it failed somehow

vwxyzjn commented 1 year ago

Oh I noticed the re-run only have the gym environments. The dm control envs haveenv_id like dm_control/acrobot-swingup-v0 dm_control/acrobot-swingup_sparse-v0

vwxyzjn commented 1 year ago

@nidhishs The num_envs issue should be fixed now. @vwxyzjn to get the code snippet to run, I had to slightly modify the pyproject.toml to enable automatic installation of the right torch version for the installed CUDA driver. Taken from here: python-poetry/poetry#4231 (comment)

Also a quick note on this: could you try installing the latest torch version to see if the issue persists? The latest torch should resolve these issues automatically (since torch==1.13 CUDA 11.3+ is used I think).

dtch1997 commented 1 year ago

@vwxyzjn benchmarks complete. Also, the most recent version of torch fixed the cuda issue

vwxyzjn commented 1 year ago

We were so close to a perfect solution, but torch==1.13.0 breaks installation on windows and linux. Looks like it's getting fixed in torch==1.13.1 (https://github.com/pytorch/pytorch/issues/88049), but let's not block this PR. I will downgrade to torch==1.12.1, and the CUDA issues are already pointed out in the docs

image image
vwxyzjn commented 1 year ago

Hey some quick updates: I re-think a bit and think we can just pull the trigger on the main ppo_continuous_action.py, since the gymnasium version also supported the v2 environments, giving us good backward compatibility. I needed to manually implementing the wandb video upload though.

Just compared with the existing experiments, there is no performance regression.

image
vwxyzjn commented 1 year ago

Docs preview looks like this. Once CI passes, I think we will be ready to merge the PR.

https://user-images.githubusercontent.com/5555347/206934822-00f78eb9-a6f7-4e81-8783-bf34b5d013c5.mp4

vwxyzjn commented 1 year ago

CI passed, but I had to mark the ubuntu install with continue-on-error: true # MUJOCO_GL=osmesa results infree(): invalid pointer`` because of https://github.com/deepmind/mujoco/issues/644

vwxyzjn commented 1 year ago

@dosssman not right now with wandb. Pending https://github.com/wandb/wandb/issues/4510.