openai / baselines

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
MIT License
15.65k stars 4.86k forks source link

Number of episodes & timesteps not matching paper, with same hyperparameters #1025

Open RyanRizzo96 opened 4 years ago

RyanRizzo96 commented 4 years ago

I am trying to reproduce results presented in this paper. On page 4, the authors state:

... we train for 50 epochs (one epoch consists of 19 2 50 = 1900 full episodes), which amounts to a total of 4.75 * 10^6 timesteps.

The 1900 episodes are broken down into Rollouts per MPI worker (2) Number of MPI Workers (19) Cycles per epoch (50), as shown in the hyper parameters section on page 10.

When testing on a machine with sufficient CPU cores, using this Repository, I am using 19 MPI workers and the same hyperparams as presented in the paper:

'n_cycles': 50,  # per epoch
'rollout_batch_size': 2,  # per mpi thread

By the same calculation, this means that I should have 19 50 2 = 1900 episodes per epoch.

However when I run her on FetchReach-v1 turns out I only have 50 episodes per epoch. Here is a log sample:

Training...
---------------------------------
| epoch              | 0        |
| stats_g/mean       | 0.893    |
| stats_g/std        | 0.122    |
| stats_o/mean       | 0.269    |
| stats_o/std        | 0.0392   |
| test/episode       | 10       |
| test/mean_Q        | -0.602   |
| test/success_rate  | 0.5      |
| train/episode      | 50       |  <-- 50 episodes/epoch
| train/success_rate | 0        |
---------------------------------

Why is there this discrepancy? Any suggestions would be appreciated.

christopherhesse commented 4 years ago

It looks like that number reports episodes per mpi worker. You should be getting 100 there instead of 50 though, can you verify that rollout_batch_size is actually getting set correctly inside the train() function?