vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
5.4k stars 616 forks source link

Is is possible for SAC to support gymnasium too as TD3 and PPO ? #421

Closed qiuruiyu closed 11 months ago

qiuruiyu commented 11 months ago

I'd like to ask for a gymnasium version of SAC in the later updated version of cleanrl. Is it possible ?

sdpkjc commented 11 months ago

Hey @qiuruiyu, thanks for your suggestion! You're in luck - @pseudo-rnd-thoughts has been working on a gymnasium version of SAC in #378, and it's nearly ready. There are just a few finishing touches left to do, which I'm planning to wrap up soon. So stay tuned for this update, we think you're going to love it!

pseudo-rnd-thoughts commented 11 months ago

Thanks @sdpkjc, I remember running SAC but finding that it was really slow (compared to DDPG or TD3). I believe Costa mentioned that this wasn't an issue but I would confirm if that is still true

sdpkjc commented 11 months ago

@pseudo-rnd-thoughts Thank you for your hard work on this! I also had an experience with SAC being considerably slower than DDPG or TD3. I believe it's due to the more complex updating process of the neural networks in SAC, which demands more GPU resources. Indeed, it is slow 😂. Maybe a version with JAX could speed it up.

sdpkjc commented 11 months ago

Closed by #378.