TD3: fixed dimension of clipped_noise for target actions, added noise …

dosssman commented 1 year ago

Description

Closes #279.

td3_continuous_action.py: noise sampled to compute the Q network update target matches action dimensions in the buffer
td3_continuous_action.py: aforementioned noise is also scaled to match the scaling range of the actions.

Types of changes

[x] Bug fix
[ ] New feature
[ ] New algorithm
[ ] Documentation

Checklist:

[x] I've read the CONTRIBUTION guide (required).
[x] I have ensured pre-commit run --all-files passes (required).
~~[ ] I have updated the documentation and previewed the changes via mkdocs serve.~~
[x] I have updated the tests accordingly (if applicable).

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See https://github.com/vwxyzjn/cleanrl/pull/137 as an example PR.

[x] I have contacted vwxyzjn to obtain access to the openrlbenchmark W&B team (required).
[x] I have tracked applicable experiments in openrlbenchmark/cleanrl with --capture-video flag toggled on (required).
[x] I have added additional documentation and previewed the changes via mkdocs serve.
- [x] I have explained note-worthy implementation details.
- [x] I have explained the logged metrics.
- [x] I have added links to the original paper and related papers (if applicable).
- [x] I have added links to the PR related to the algorithm.
- [x] I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
- [x] I have added the learning curves (in PNG format with width=500 and height=300).
- [x] I have added links to the tracked experiments.
- [x] I have updated the overview sections at the docs and the repo
[x] I have updated the tests accordingly (if applicable).

vercel[bot] commented 1 year ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated
cleanrl	✅ Ready (Inspect)	Visit Preview	Oct 19, 2022 at 6:46PM (UTC)

dosssman commented 1 year ago

@vwxyzjn td3 continuous JAX variant will probably be affected too though.