SAC discrete - Githubissues

timoklein commented 2 years ago

Hey there!

I've used this repo's SAC code as starting point for an implementation of SAC-discrete (paper) for a project of mine. If you're interested, I'd be willing to contribute it to cleanRL.

The differences to SAC for continuous action spaces aren't too big and I can start from a working implementation, so this shouldn't take too long.

What do you think?

Checklist

[x] I have installed dependencies via poetry install (see CleanRL's installation guideline.
[x] I have checked that there is no similar issue in the repo (required)

vwxyzjn commented 2 years ago

Hi @timoklein, thanks for being interested in submitting a contribution! SAC discrete indeed sounds like an interesting addition to CleanRL. I just glanced at the paper and would recommend prototyping a sac_atari.py to work with Atari games as done in the paper.

I was a bit surprised to see the algorithm performs poorly on Pong. Do you have any insight on this? Maybe this is some implementation details stuff... CC @dosssman, who was the main contributor for the CleanRL's SAC implementation.

timoklein commented 2 years ago

I just glanced at the paper and would recommend prototyping a sac_atari.py to work with Atari games as done in the paper.

Getting to work on it!

I was a bit surprised to see the algorithm performs poorly on Pong. Do you have any insight on this? Maybe this is some implementation details stuff... CC @dosssman, who was the main contributor for the CleanRL's SAC implementation.

For reference, here are the reported results in the paper: SAC_discrete_results

In my opinion, the bad results on Pong are due to the evaluation scheme. Evaluation at 100k time steps on Atari is in my opinion a very tough setting for a "standard" model-free RL (some newer methods like CURL or DrQ may perform better). Rainbow also doesn't improve over a random agent in this setting. Therefore we should focus the evaluation on games where meaningful improvements over a random baseline can be made, e.g. Seaquest, James Bond or Road Runner.

vwxyzjn commented 10 months ago

Closed by #270

vwxyzjn / cleanrl

SAC discrete #266

Checklist