Fixes actor_loss shape for SAC continuous

dosssman commented 1 year ago

Description

Address issues pointed out in #379

Types of changes

[x] Bug fix

Checklist:

[x] I've read the CONTRIBUTION guide (required).
[x] I have ensured pre-commit run --all-files passes (required).
[ ] ~~I have updated the tests accordingly (if applicable).~~
[ ] I have updated the documentation and previewed the changes via mkdocs serve.
- [ ] I have explained note-worthy implementation details.
- [ ] I have explained the logged metrics.
- [ ] I have added links to the original paper and related papers.

If you need to run benchmark experiments for a performance-impacting changes:

[x] I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team.
[x] I have used the benchmark utility to submit the tracked experiments to the openrlbenchmark/cleanrl W&B project, optionally with --capture-video.
[ ] I have performed RLops with python -m openrlbenchmark.rlops.
- For new feature or bug fix:
  - [ ] I have used the RLops utility to understand the performance impact of the changes and confirmed there is no regression.
- For new algorithm:
  - [ ] I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
- [ ] I have added the learning curves generated by the python -m openrlbenchmark.rlops utility to the documentation.
- [ ] I have added links to the tracked experiments in W&B, generated by python -m openrlbenchmark.rlops ....your_args... --report, to the documentation.

vercel[bot] commented 1 year ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
cleanrl	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	May 8, 2023 3:35am

dosssman commented 1 year ago

Didn't manage to get the rlops working yet, so regression report was done manually:

https://api.wandb.ai/links/openrlbenchmark/on2vqz6u

timoklein commented 1 year ago

https://api.wandb.ai/links/openrlbenchmark/on2vqz6u

So the version that has the bug fixed is actually worse? That's odd.

dosssman commented 1 year ago

Happened a few times before. Probably due to stochasticity that occurs during the sampling process, or due to the difference en environment / hardware. Might add more runs to ascertain it, if you feel like it is necessary. Performance regression is only on Walker2d it seems, the rest has very close performance to the rl-pilot baseline.

timoklein commented 1 year ago

Might add more runs to ascertain it, if you feel like it is necessary.

I can run a couple of experiments if you like but not before May 18th. But to me, it looks OK.

Probably due to stochasticity

I agree. We'd probably need 50+ runs to properly verify anything anyway, that's a little excessive :D

dosssman commented 1 year ago

All good on my side too.

dosssman commented 1 year ago

Fixes #379

vwxyzjn / cleanrl