vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
5.26k stars 602 forks source link

TD3: fixed dimension of clipped_noise for target actions, added noise … #281

Closed dosssman closed 1 year ago

dosssman commented 1 year ago

Description

Closes #279.

Types of changes

Checklist:

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See https://github.com/vwxyzjn/cleanrl/pull/137 as an example PR.

vercel[bot] commented 1 year ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
cleanrl ✅ Ready (Inspect) Visit Preview Oct 19, 2022 at 6:46PM (UTC)
dosssman commented 1 year ago

@vwxyzjn td3 continuous JAX variant will probably be affected too though.

vwxyzjn commented 1 year ago

@joaogui1 could you check if TD3 (JAX) will be affected?

joaogui1 commented 1 year ago

@vwxyzjn check whether the jax implementation affected by this PR or if it will also need a fix like this PR?

vwxyzjn commented 1 year ago

@joaogui1, the latter :)

joaogui1 commented 1 year ago

Got it, it will need to be fix, creating the PR this moment

vwxyzjn commented 1 year ago

@dosssman thank you for the PR! Would you mind running some benchmark experiments to see if this change has a significant impact on the performance? If not, we don't even have to update the docs, since the main purpose of re-doing benchmark is to ensure no regression in performance.

https://github.com/vwxyzjn/cleanrl/blob/f0bbf49ff2f0786882dc8ac0eded8b483a36b6ca/benchmark/td3.sh#L3-L7

dosssman commented 1 year ago

There is a performance regression on the Walker2d env, but the others are only marginally affected:

Report here

vwxyzjn commented 1 year ago

Hmm interesting. Thanks for running the expriments. Although one experiment failed... Would you mind re-running it?

image
vwxyzjn commented 1 year ago

Given that this is a performance-impacting change, I am re-running the benchmark now.

vwxyzjn commented 1 year ago

No material change to the performance (there is a minor regression in Walker2d-v2. I am going to update the docs and merge the PR. The regression report is at https://wandb.ai/openrlbenchmark/cleanrl-cache/reports/-281-CleanRL-s-TD3-TD3-Trgt-Action-Noise-Sample-Check--VmlldzoyODE5MTg0

image image image image image image