zplizzi / pytorch-ppo

Simple, readable, yet full-featured implementation of PPO in Pytorch
44 stars 8 forks source link

Performance screenshots look weird #1

Closed vwxyzjn closed 5 years ago

vwxyzjn commented 5 years ago

IMG_0159

zplizzi commented 5 years ago

They're histograms of the episode returns. So if we're running 100 rollouts in parallel, each will have a different score. You could just plot the mean of all these scores, but plotting the histogram shows some additional interesting information - in this case, you can see that the scores are bimodal after step 400 - some episodes do pretty poorly, but others are doing pretty well (and there's not much in between).

vwxyzjn commented 5 years ago

@zplizzi I see. Thanks for the comment.