********** Iteration 2 ************
Total number of episodes: 1166
KL between old and new distribution: 0.00500573
Entropy: 2.94753
Surrogate loss: -0.0609315
Average sum of rewards per episode: -0.198684210526
Baseline explained: -0.0523932738426
Time elapsed: 0.20 mins
Rollout
********** Iteration 3 ************
Total number of episodes: 1549
KL between old and new distribution: 0.00585226
Entropy: 2.92717
Surrogate loss: -0.0543403
Average sum of rewards per episode: -0.195822454308
Baseline explained: -0.0704836218784
Time elapsed: 0.29 mins
Rollout
Fix #1 and #2
Results are on: https://gym.openai.com/evaluations/eval_yumMelmmSNWeM4RexXZX5g#reproducibility
Example logs: