Closed vwxyzjn closed 5 years ago
They're histograms of the episode returns. So if we're running 100 rollouts in parallel, each will have a different score. You could just plot the mean of all these scores, but plotting the histogram shows some additional interesting information - in this case, you can see that the scores are bimodal after step 400 - some episodes do pretty poorly, but others are doing pretty well (and there's not much in between).
@zplizzi I see. Thanks for the comment.