This is a benchmark ran using the current master branch. All the results are shown below and the data folders including the metrics and models are uploaded to the SLM Lab public Dropbox with file prefix PR396-.
To Reproduce
JSON spec: See the spec/benchmark folder
git SHA (contained in the file above): 8360612e05985210dcf84de2f8302440a5c8d81c
Env. \ Alg.
A2C (GAE)
A2C (n-step)
PPO
DQN
DDQN+PER
Breakout graph
389.99 graph
391.32 graph
425.89graph
65.04 graph
181.72 graph
Pong graph
20.04 graph
19.66 graph
20.09 graph
18.34 graph
20.44graph
Qbert graph
13,328.32 graph
13,259.19 graph
13,691.89graph
4,787.79 graph
11,673.52 graph
Seaquest graph
892.68 graph
1,686.08 graph
1,583.04 graph
1,118.50 graph
3,751.34graph
Terminology
A2C (GAE): Advantage Actor-Critic with GAE as advantage estimation
A2C (n-step): Advantage Actor-Critic with n-step return as advantage estimation
DDQN+PER: Double Deep Q-Learning with Prioritized Experience Replay
Atari benchmark
This is a benchmark ran using the current
master
branch. All the results are shown below and the data folders including the metrics and models are uploaded to the SLM Lab public Dropbox with file prefix PR396-.To Reproduce
spec/benchmark
foldergraph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
Terminology