vwxyzjn / cleanba

CleanRL's implementation of DeepMind's Podracer Sebulba Architecture for Distributed DRL
Other
105 stars 11 forks source link

Missing clipped value loss in PPO implementation #19

Open francelico opened 8 months ago

francelico commented 8 months ago

Hi @vwxyzjn ,

This codebase is great, thanks for the hard work! I've been using it to run baseline experiments in procgen, and I've noticed that your implementation of PPO does not use value loss clipping. However it is enabled by default in the Pytorch implementation that is most often encountered in papers testing agents in procgen.

Is there a reason why it was left out? I'm not super familiar with ALE, perhaps it is not as common there?

As part of my project I've created scripts to train and evaluate PPO in procgen* and I've implemented the DAAC agent (https://arxiv.org/abs/2102.10330). Would you like me to make a PR to include them to cleanba?

*On top of re-implementing value loss clipping in PPO I found minor differences between the atari and procgen environments, such as the info dict returned by envpool.step() being slightly different, and the videos in the eval script supporting grayscale images only.

vwxyzjn commented 8 months ago

Hi @francelico, thanks for the message. I turned it off because, in practice, it didn't seem to matter that much to the performance. As much as I'd love to have a DAAC agent in Cleanba, maybe not for now as this repo is mainly for distributed DRL stuff and kind of archived.