Episode score at the end of training attained by SLM Lab implementations on discrete-action control problems. Reported episode scores are the average over the last 100 checkpoints, and then averaged over 4 Sessions. Results marked with * were trained using the hybrid synchronous/asynchronous version of SAC to parallelize and speed up training time.
Episode score at the end of training attained by SLM Lab implementations on continuous control problems. Reported episode scores are the average over the last 100 checkpoints, and then averaged over 4 Sessions. Results marked with * require 50M-100M frames, so we use the hybrid synchronous/asynchronous version of SAC to parallelize and speed up training time.
The table above presents results for 62 Atari games. All agents were trained for 10M frames (40M including skipped frames). Reported results are the episode score at the end of training, averaged over the previous 100 evaluation checkpoints with each checkpoint averaged over 4 Sessions. Agents were checkpointed every 10k training frames.
Full benchmark upload
Plot Legend
Discrete Benchmark
graph
graph
graph
graph
graph
graph
graph
Continuous Benchmark
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
Atari Benchmark
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph