Stop adding action bias twice in DDPG jax

joaogui1 commented 1 year ago

Description

Fixes the jax part of #297

Types of changes

[x] Bug fix
[ ] New feature
[ ] New algorithm
[ ] Documentation

Checklist:

[x] I've read the CONTRIBUTION guide (required).
[x] I have ensured pre-commit run --all-files passes (required).
[] I have updated the documentation and previewed the changes via mkdocs serve.
[x] I have updated the tests accordingly (if applicable).

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See https://github.com/vwxyzjn/cleanrl/pull/137 as an example PR.

[x] I have contacted vwxyzjn to obtain access to the openrlbenchmark W&B team (required).
[x] I have tracked applicable experiments in openrlbenchmark/cleanrl with --capture-video flag toggled on (required).
[x] I have added additional documentation and previewed the changes via mkdocs serve.
- [x] I have explained note-worthy implementation details.
- [x] I have explained the logged metrics.
- [x] I have added links to the original paper and related papers (if applicable).
- [x] I have added links to the PR related to the algorithm.
- [x] I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
- [x] I have added the learning curves (in PNG format with width=500 and height=300).
- [x] I have added links to the tracked experiments.
- [x] I have updated the overview sections at the docs and the repo
[x] I have updated the tests accordingly (if applicable).

vercel[bot] commented 1 year ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated
cleanrl	✅ Ready (Inspect)	Visit Preview	Nov 4, 2022 at 0:15AM (UTC)

vwxyzjn commented 1 year ago

No regression in performance.

https://wandb.ai/openrlbenchmark/cleanrl-cache/reports/Regression-Report-ddpg_continuous_action_jax-v1-0-0b2-9-g4605546-latest---VmlldzoyODg1MDM5

vwxyzjn commented 1 year ago

No regression in performance — most of the performance differences can be explained by stochasticity.

               CleanRL's ddpg_continuous_action_jax (pr-298?costa-huang) CleanRL's ddpg_continuous_action_jax (rlops-pilot?costa-huang)
Hopper-v2                                        1275.28 ± 209.60                                          1213.19 ± 202.90            
Walker2d-v2                                      1083.15 ± 567.65                                            1293.76 ± 8.10            
HalfCheetah-v2                                   9592.25 ± 135.10                                          9884.51 ± 209.49 

               CleanRL's ddpg_continuous_action_jax (pr-298?joaogui1) CleanRL's ddpg_continuous_action_jax (rlops-pilot?joaogui1)
Hopper-v2                                         1145.05 ± 41.95                                       1590.63 ± 378.02         
Walker2d-v2                                      1303.82 ± 448.41                                       1355.26 ± 337.38         
HalfCheetah-v2                                  9125.06 ± 1477.58                                      9687.09 ± 1177.37

wandb report 1, wandb report 2

are generated by

python -m cleanrl_utils.rlops --exp-name ddpg_continuous_action_jax \
    --wandb-project-name cleanrl \
    --wandb-entity openrlbenchmark \
    --tags 'pr-298?costa-huang' 'rlops-pilot?costa-huang' \
    --env-ids Hopper-v2 Walker2d-v2 HalfCheetah-v2 \
    --output-filename rlops_static/compare1.png \
    --scan-history

python -m cleanrl_utils.rlops --exp-name ddpg_continuous_action_jax \
    --wandb-project-name cleanrl \
    --wandb-entity openrlbenchmark \
    --tags 'pr-298?joaogui1' 'rlops-pilot?joaogui1' \
    --env-ids Hopper-v2 Walker2d-v2 HalfCheetah-v2 \
    --output-filename rlops_static/compare2.png \
    --scan-history

vwxyzjn / cleanrl