vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
5.26k stars 602 forks source link

Stop adding action bias twice in DDPG jax #298

Closed joaogui1 closed 1 year ago

joaogui1 commented 1 year ago

Description

Fixes the jax part of #297

Types of changes

Checklist:

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See https://github.com/vwxyzjn/cleanrl/pull/137 as an example PR.

vercel[bot] commented 1 year ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
cleanrl ✅ Ready (Inspect) Visit Preview Nov 4, 2022 at 0:15AM (UTC)
vwxyzjn commented 1 year ago

No regression in performance.

https://wandb.ai/openrlbenchmark/cleanrl-cache/reports/Regression-Report-ddpg_continuous_action_jax-v1-0-0b2-9-g4605546-latest---VmlldzoyODg1MDM5

image image image
vwxyzjn commented 1 year ago

No regression in performance — most of the performance differences can be explained by stochasticity.

               CleanRL's ddpg_continuous_action_jax (pr-298?costa-huang) CleanRL's ddpg_continuous_action_jax (rlops-pilot?costa-huang)
Hopper-v2                                        1275.28 ± 209.60                                          1213.19 ± 202.90            
Walker2d-v2                                      1083.15 ± 567.65                                            1293.76 ± 8.10            
HalfCheetah-v2                                   9592.25 ± 135.10                                          9884.51 ± 209.49 

               CleanRL's ddpg_continuous_action_jax (pr-298?joaogui1) CleanRL's ddpg_continuous_action_jax (rlops-pilot?joaogui1)
Hopper-v2                                         1145.05 ± 41.95                                       1590.63 ± 378.02         
Walker2d-v2                                      1303.82 ± 448.41                                       1355.26 ± 337.38         
HalfCheetah-v2                                  9125.06 ± 1477.58                                      9687.09 ± 1177.37         
image image

wandb report 1, wandb report 2

are generated by

python -m cleanrl_utils.rlops --exp-name ddpg_continuous_action_jax \
    --wandb-project-name cleanrl \
    --wandb-entity openrlbenchmark \
    --tags 'pr-298?costa-huang' 'rlops-pilot?costa-huang' \
    --env-ids Hopper-v2 Walker2d-v2 HalfCheetah-v2 \
    --output-filename rlops_static/compare1.png \
    --scan-history

python -m cleanrl_utils.rlops --exp-name ddpg_continuous_action_jax \
    --wandb-project-name cleanrl \
    --wandb-entity openrlbenchmark \
    --tags 'pr-298?joaogui1' 'rlops-pilot?joaogui1' \
    --env-ids Hopper-v2 Walker2d-v2 HalfCheetah-v2 \
    --output-filename rlops_static/compare2.png \
    --scan-history