Closed dtch1997 closed 1 year ago
The latest updates on your projects. Learn more about Vercel for Git ↗︎
Name | Status | Preview | Updated |
---|---|---|---|
cleanrl | ✅ Ready (Inspect) | Visit Preview | Dec 12, 2022 at 8:55PM (UTC) |
CI passed. @dtch1997 would you mind running the first round of benchmark? Don't worry about capturing videos yet because of upstream issues.
export WANDB_ENTITY=openrlbenchmark
poetry install --with mujoco
OMP_NUM_THREADS=1 xvfb-run -a python -m cleanrl_utils.benchmark \
--env-ids HalfCheetah-v4 Walker2d-v4 Hopper-v4 InvertedPendulum-v4 Humanoid-v4 Pusher-v4 \
--command "poetry run python cleanrl/gymnasium_support/ppo_continuous_action.py --cuda False --track --capture-video" \
--num-seeds 3 \
--workers 1
Benchmark in progress: https://wandb.ai/openrlbenchmark/cleanrl?workspace=user-dtch1997
Great thank you!
Executing the following command in https://github.com/vwxyzjn/ppo-atari-metrics
python rlops.py --wandb-project-name cleanrl \
--wandb-entity openrlbenchmark \
--filters 'ppo_continuous_action?tag=rlops-pilot' 'ppo_continuous_action?tag=pr-320' \
--env-ids HalfCheetah-v4 Walker2d-v4 Hopper-v4 InvertedPendulum-v4 Humanoid-v4 Pusher-v4 \
--output-filename compare.png --scan-history
generates
ppo_continuous_action ({'tag': ['rlops-pilot']}) | ppo_continuous_action ({'tag': ['pr-320']}) | |
---|---|---|
HalfCheetah-v4 | 1795.55 ± 819.96 | 2241.90 ± 1150.61 |
Walker2d-v4 | 2983.19 ± 757.43 | 3577.82 ± 315.46 |
Hopper-v4 | 2279.97 ± 450.53 | 2111.14 ± 335.94 |
InvertedPendulum-v4 | 890.99 ± 48.93 | 950.98 ± 36.39 |
Humanoid-v4 | 671.07 ± 83.75 | 728.82 ± 62.35 |
Pusher-v4 | -51.27 ± 9.02 | -49.51 ± 3.96 |
Thank you @dtch1997, would you be interested in helping run some dm_control
experiments? Please pull the latest code and run
export WANDB_ENTITY=openrlbenchmark
poetry install --with dm_control,mujoco
OMP_NUM_THREADS=1 xvfb-run -a poetry run python -m cleanrl_utils.benchmark \
--env-ids dm_control/acrobot-swingup-v0 dm_control/acrobot-swingup_sparse-v0 dm_control/ball_in_cup-catch-v0 dm_control/cartpole-balance-v0 dm_control/cartpole-balance_sparse-v0 dm_control/cartpole-swingup-v0 dm_control/cartpole-swingup_sparse-v0 dm_control/cartpole-two_poles-v0 dm_control/cartpole-three_poles-v0 dm_control/cheetah-run-v0 dm_control/dog-stand-v0 dm_control/dog-walk-v0 dm_control/dog-trot-v0 dm_control/dog-run-v0 dm_control/dog-fetch-v0 dm_control/finger-spin-v0 dm_control/finger-turn_easy-v0 dm_control/finger-turn_hard-v0 dm_control/fish-upright-v0 dm_control/fish-swim-v0 dm_control/hopper-stand-v0 dm_control/hopper-hop-v0 dm_control/humanoid-stand-v0 dm_control/humanoid-walk-v0 dm_control/humanoid-run-v0 dm_control/humanoid-run_pure_state-v0 dm_control/humanoid_CMU-stand-v0 dm_control/humanoid_CMU-run-v0 dm_control/lqr-lqr_2_1-v0 dm_control/lqr-lqr_6_2-v0 dm_control/manipulator-bring_ball-v0 dm_control/manipulator-bring_peg-v0 dm_control/manipulator-insert_ball-v0 dm_control/manipulator-insert_peg-v0 dm_control/pendulum-swingup-v0 dm_control/point_mass-easy-v0 dm_control/point_mass-hard-v0 dm_control/quadruped-walk-v0 dm_control/quadruped-run-v0 dm_control/quadruped-escape-v0 dm_control/quadruped-fetch-v0 dm_control/reacher-easy-v0 dm_control/reacher-hard-v0 dm_control/stacker-stack_2-v0 dm_control/stacker-stack_4-v0 dm_control/swimmer-swimmer6-v0 dm_control/swimmer-swimmer15-v0 dm_control/walker-stand-v0 dm_control/walker-walk-v0 dm_control/walker-run-v0 \
--command "poetry run python cleanrl/gymnasium_support/ppo_continuous_action.py --cuda False --track" \
--num-seeds 3 \
--workers 9
Hey @dtch1997, I tried running the ppo_continous_actions.py
file with --num_envs=4
however done = terminated or truncated
no longer works due to terminated
and truncated
being Numpy arrays. I believe numpy.logical_or
should fix it.
@nidhishs The num_envs
issue should be fixed now.
@vwxyzjn to get the code snippet to run, I had to slightly modify the pyproject.toml to enable automatic installation of the right torch version for the installed CUDA driver. Taken from here: https://github.com/python-poetry/poetry/issues/4231#issuecomment-1182766775
Benchmark ongoing: https://wandb.ai/openrlbenchmark/cleanrl/runs/2tigs6f1
@dtch1997 thanks a lot! Would you mind helping run the dmcontrol experiments? (https://github.com/vwxyzjn/cleanrl/pull/320#issuecomment-1322280088)
@vwxyzjn I ran those last week, did the results show up here? https://wandb.ai/openrlbenchmark/cleanrl?workspace=user-dtch1997
Happy to re-run if it failed somehow
Oh I noticed the re-run only have the gym environments. The dm control envs haveenv_id
like dm_control/acrobot-swingup-v0 dm_control/acrobot-swingup_sparse-v0
@nidhishs The
num_envs
issue should be fixed now. @vwxyzjn to get the code snippet to run, I had to slightly modify the pyproject.toml to enable automatic installation of the right torch version for the installed CUDA driver. Taken from here: python-poetry/poetry#4231 (comment)
Also a quick note on this: could you try installing the latest torch
version to see if the issue persists? The latest torch
should resolve these issues automatically (since torch==1.13
CUDA 11.3+ is used I think).
@vwxyzjn benchmarks complete. Also, the most recent version of torch fixed the cuda issue
We were so close to a perfect solution, but torch==1.13.0 breaks installation on windows and linux. Looks like it's getting fixed in torch==1.13.1 (https://github.com/pytorch/pytorch/issues/88049), but let's not block this PR. I will downgrade to torch==1.12.1, and the CUDA issues are already pointed out in the docs
Hey some quick updates: I re-think a bit and think we can just pull the trigger on the main ppo_continuous_action.py
, since the gymnasium version also supported the v2 environments, giving us good backward compatibility. I needed to manually implementing the wandb video upload though.
Just compared with the existing experiments, there is no performance regression.
Docs preview looks like this. Once CI passes, I think we will be ready to merge the PR.
https://user-images.githubusercontent.com/5555347/206934822-00f78eb9-a6f7-4e81-8783-bf34b5d013c5.mp4
CI passed, but I had to mark the ubuntu install with continue-on-error: true # MUJOCO_GL=osmesa results in
free(): invalid pointer`` because of https://github.com/deepmind/mujoco/issues/644
@dosssman not right now with wandb. Pending https://github.com/wandb/wandb/issues/4510.
Description
Types of changes
Checklist:
pre-commit run --all-files
passes (required).mkdocs serve
.If you are adding new algorithm variants or your change could result in performance difference, you may need to (re-)run tracked experiments. See https://github.com/vwxyzjn/cleanrl/pull/137 as an example PR.
--capture-video
flag toggled on (required).mkdocs serve
.