SAC Implementation Details - Githubissues

vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)

http://docs.cleanrl.dev

Other

5.26k stars 602 forks source link

SAC Implementation Details #304

Open araffin opened 1 year ago

araffin commented 1 year ago

Reading https://github.com/vwxyzjn/cleanrl/blob/master/docs/rl-algorithms/sac.md and the code while implementing SAC with Jax (#300 ), several tweaks have been made compared to the original SAC implementation and I was wondering why and if the impact of each of those tweaks have been tested (apart from higher learning for the qf, which does yield slightly better results).

For a reference, I benchmarked my implementation here: https://wandb.ai/openrlbenchmark/cleanrl/reports/SAC-jax---VmlldzoyODM4MjU0 The only tweak I kept was the higher lr for the qf, the rest is the same as the original SAC implementation.

Related to https://github.com/vwxyzjn/cleanrl/pull/21

CC @dosssman

List of implementation details

storing the scaled or unscaled actions in the replay buffer?

Checklist

[ ] I have installed dependencies via poetry install (see CleanRL's installation guideline.
[x] I have checked that there is no similar issue in the repo (required)