vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
5.26k stars 602 forks source link

fix: ddpg action bias #299

Closed sdpkjc closed 1 year ago

sdpkjc commented 1 year ago

Description

Fixes the first part of https://github.com/vwxyzjn/cleanrl/issues/297

Types of changes

Checklist:

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See https://github.com/vwxyzjn/cleanrl/pull/137 as an example PR.

vercel[bot] commented 1 year ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
cleanrl ✅ Ready (Inspect) Visit Preview Nov 3, 2022 at 9:12PM (UTC)
vwxyzjn commented 1 year ago

Thanks for the PR. Running some benchmark experiments now.

vwxyzjn commented 1 year ago

Using the following snippet from #307

python rlops.py --exp-name ddpg_continuous_action \
    --wandb-project-name cleanrl \
    --wandb-entity openrlbenchmark \
    --tags  pr-299 rlops-pilot \
    --env-ids Hopper-v2 Walker2d-v2 HalfCheetah-v2 \
    --output-filename compare.png \
    --report

we generate the following image

image image image image

Discussion

What remains is to update the documentation and optionally run more experiments in more envs.

vwxyzjn commented 1 year ago

Experiments were done, and the docs were updated. Using the following command from #307 generated the following figure and table

python -m cleanrl_utils.rlops --exp-name ddpg_continuous_action \
    --wandb-project-name cleanrl \
    --wandb-entity openrlbenchmark \
    --tags 'pr-299' 'rlops-pilot' \
    --env-ids HalfCheetah-v2 Walker2d-v2 Hopper-v2 InvertedPendulum-v2 Humanoid-v2 Pusher-v2 \
    --output-filename compare.png \
    --scan-history \
    --metric-last-n-average-window 100 \
    --report
                    CleanRL's ddpg_continuous_action (pr-299) CleanRL's ddpg_continuous_action (rlops-pilot)
HalfCheetah-v2                              10210.57 ± 196.22                              9205.65 ± 1093.88
Walker2d-v2                                  1661.14 ± 250.01                               1447.09 ± 260.24
Hopper-v2                                    1007.44 ± 148.29                               1126.37 ± 278.02
InvertedPendulum-v2                            684.61 ± 94.41                                 544.77 ± 50.98
Humanoid-v2                                    910.61 ± 97.58                                 849.05 ± 40.64
Pusher-v2                                       -39.39 ± 9.54                                  -32.52 ± 2.03
image
vwxyzjn commented 1 year ago

Thanks @sdpkjc for this PR and raising the issue.