vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
5.41k stars 616 forks source link

Fixes actor_loss shape for SAC continuous #383

Closed dosssman closed 1 year ago

dosssman commented 1 year ago

Description

Address issues pointed out in #379

Types of changes

Checklist:

If you need to run benchmark experiments for a performance-impacting changes:

vercel[bot] commented 1 year ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
cleanrl ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 8, 2023 3:35am
dosssman commented 1 year ago

Didn't manage to get the rlops working yet, so regression report was done manually:

https://api.wandb.ai/links/openrlbenchmark/on2vqz6u

timoklein commented 1 year ago

https://api.wandb.ai/links/openrlbenchmark/on2vqz6u

So the version that has the bug fixed is actually worse? That's odd.

dosssman commented 1 year ago

Happened a few times before. Probably due to stochasticity that occurs during the sampling process, or due to the difference en environment / hardware. Might add more runs to ascertain it, if you feel like it is necessary. Performance regression is only on Walker2d it seems, the rest has very close performance to the rl-pilot baseline.

timoklein commented 1 year ago

Might add more runs to ascertain it, if you feel like it is necessary.

I can run a couple of experiments if you like but not before May 18th. But to me, it looks OK.

Probably due to stochasticity

I agree. We'd probably need 50+ runs to properly verify anything anyway, that's a little excessive :D

dosssman commented 1 year ago

All good on my side too.

dosssman commented 1 year ago

Fixes #379