Missing clipped value loss in PPO implementation

Hi @vwxyzjn ,

This codebase is great, thanks for the hard work! I've been using it to run baseline experiments in procgen, and I've noticed that your implementation of PPO does not use value loss clipping. However it is enabled by default in the Pytorch implementation that is most often encountered in papers testing agents in procgen.

Is there a reason why it was left out? I'm not super familiar with ALE, perhaps it is not as common there?

As part of my project I've created scripts to train and evaluate PPO in procgen* and I've implemented the DAAC agent (https://arxiv.org/abs/2102.10330). Would you like me to make a PR to include them to cleanba?

*On top of re-implementing value loss clipping in PPO I found minor differences between the atari and procgen environments, such as the info dict returned by envpool.step() being slightly different, and the videos in the eval script supporting grayscale images only.

vwxyzjn / cleanba

Missing clipped value loss in PPO implementation #19