Closed timoklein closed 10 months ago
Hi @timoklein, thanks for being interested in submitting a contribution! SAC discrete indeed sounds like an interesting addition to CleanRL. I just glanced at the paper and would recommend prototyping a sac_atari.py
to work with Atari games as done in the paper.
I was a bit surprised to see the algorithm performs poorly on Pong. Do you have any insight on this? Maybe this is some implementation details stuff... CC @dosssman, who was the main contributor for the CleanRL's SAC implementation.
I just glanced at the paper and would recommend prototyping a sac_atari.py to work with Atari games as done in the paper.
Getting to work on it!
I was a bit surprised to see the algorithm performs poorly on Pong. Do you have any insight on this? Maybe this is some implementation details stuff... CC @dosssman, who was the main contributor for the CleanRL's SAC implementation.
For reference, here are the reported results in the paper:
In my opinion, the bad results on Pong are due to the evaluation scheme. Evaluation at 100k time steps on Atari is in my opinion a very tough setting for a "standard" model-free RL (some newer methods like CURL or DrQ may perform better). Rainbow also doesn't improve over a random agent in this setting. Therefore we should focus the evaluation on games where meaningful improvements over a random baseline can be made, e.g. Seaquest, James Bond or Road Runner.
Closed by #270
Hey there!
I've used this repo's SAC code as starting point for an implementation of SAC-discrete (paper) for a project of mine. If you're interested, I'd be willing to contribute it to cleanRL.
The differences to SAC for continuous action spaces aren't too big and I can start from a working implementation, so this shouldn't take too long.
What do you think?
Checklist
poetry install
(see CleanRL's installation guideline.