fix: ddpg action bias - Githubissues

sdpkjc commented 1 year ago

Description

Fixes the first part of https://github.com/vwxyzjn/cleanrl/issues/297

Types of changes

[x] Bug fix
[ ] New feature
[ ] New algorithm
[ ] Documentation

Checklist:

[x] I've read the CONTRIBUTION guide (required).
[x] I have ensured pre-commit run --all-files passes (required).
[x] I have updated the documentation and previewed the changes via mkdocs serve.
[x] I have updated the tests accordingly (if applicable).

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See https://github.com/vwxyzjn/cleanrl/pull/137 as an example PR.

[x] I have contacted vwxyzjn to obtain access to the openrlbenchmark W&B team (required).
[x] I have tracked applicable experiments in openrlbenchmark/cleanrl with --capture-video flag toggled on (required).
[x] I have added additional documentation and previewed the changes via mkdocs serve.
- [x] I have explained note-worthy implementation details.
- [x] I have explained the logged metrics.
- [x] I have added links to the original paper and related papers (if applicable).
- [x] I have added links to the PR related to the algorithm.
- [x] I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
- [x] I have added the learning curves (in PNG format with width=500 and height=300).
- [x] I have added links to the tracked experiments.
- [x] I have updated the overview sections at the docs and the repo
[x] I have updated the tests accordingly (if applicable).

vercel[bot] commented 1 year ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated
cleanrl	✅ Ready (Inspect)	Visit Preview	Nov 3, 2022 at 9:12PM (UTC)

vwxyzjn commented 1 year ago

Thanks for the PR. Running some benchmark experiments now.

vwxyzjn commented 1 year ago

Using the following snippet from #307

python rlops.py --exp-name ddpg_continuous_action \
    --wandb-project-name cleanrl \
    --wandb-entity openrlbenchmark \
    --tags  pr-299 rlops-pilot \
    --env-ids Hopper-v2 Walker2d-v2 HalfCheetah-v2 \
    --output-filename compare.png \
    --report

we generate the following image

Discussion

The matplotlib subsamples from the wandb runs and seems to result in slightly inaccurate curves sometimes
This PR improves the performance in HalfCheetah-v2
Speed is slightly faster, probably because I am now using --worker 1 instead of --worker 3

What remains is to update the documentation and optionally run more experiments in more envs.

vwxyzjn commented 1 year ago

Experiments were done, and the docs were updated. Using the following command from #307 generated the following figure and table

python -m cleanrl_utils.rlops --exp-name ddpg_continuous_action \
    --wandb-project-name cleanrl \
    --wandb-entity openrlbenchmark \
    --tags 'pr-299' 'rlops-pilot' \
    --env-ids HalfCheetah-v2 Walker2d-v2 Hopper-v2 InvertedPendulum-v2 Humanoid-v2 Pusher-v2 \
    --output-filename compare.png \
    --scan-history \
    --metric-last-n-average-window 100 \
    --report

                    CleanRL's ddpg_continuous_action (pr-299) CleanRL's ddpg_continuous_action (rlops-pilot)
HalfCheetah-v2                              10210.57 ± 196.22                              9205.65 ± 1093.88
Walker2d-v2                                  1661.14 ± 250.01                               1447.09 ± 260.24
Hopper-v2                                    1007.44 ± 148.29                               1126.37 ± 278.02
InvertedPendulum-v2                            684.61 ± 94.41                                 544.77 ± 50.98
Humanoid-v2                                    910.61 ± 97.58                                 849.05 ± 40.64
Pusher-v2                                       -39.39 ± 9.54                                  -32.52 ± 2.03

vwxyzjn commented 1 year ago

Thanks @sdpkjc for this PR and raising the issue.

vwxyzjn / cleanrl

fix: ddpg action bias #299

Description

Types of changes

Checklist:

Discussion