vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
5.02k stars 575 forks source link

About PPO+Procgen code on Jax #352

Closed sglucas closed 8 months ago

sglucas commented 1 year ago

Thank you very much for your contribution.

May I ask if is it possible to release the PPO+Procgen code based on Jax?

Best

vwxyzjn commented 1 year ago

Sure, we have upcoming plans for it #340. For now, you can use the source code here, which at first glance is compatible with the performance of ppo_procgen.py (see report).

Let me know if you would be interested in helping with running benchmark experiments.

sglucas commented 1 year ago

Hi, Thanks for your reply!

Sure. May I ask this code is right? Since I find the environment of this code is atari (breakout).

Best

vwxyzjn commented 1 year ago

The code link is correct. If you look at the summary of the tracked experiment, the env_id is BigFishEasy-v0, which is one of the procgen’s environments. As pointed out in #340, EnvPool >=0.8.1 introduces procgen environments, basically allowing us to handle Atari and procgen using the same codebase. There are minor API differences, though, if you try to diff the code and what was used to handle Atari at #338 (mostly minor API differences). Also note that there are differences in hyperparameters.

A good way is to probably rename #338's implementation as ppo_impalacnn_jax_scan.py which handles both Atari and procgen environment. Could you give it try and make a PR based on #338? Happy to provide more context and info.

sglucas commented 1 year ago

Hi @vwxyzjn do you try to use this link https://github.com/bmazoure/ppo_jax. I find the current code does not test the learned policy in all 1000 environments.

vwxyzjn commented 1 year ago

Related https://github.com/bmazoure/ppo_jax/issues/2

sglucas commented 1 year ago

Thanks a lot!

vwxyzjn commented 8 months ago

https://github.com/vwxyzjn/cleanba should work with procgen if you want to take a look :)