vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
5.42k stars 618 forks source link

Add `rnd_ppo.py` documentation and refactor #127

Closed vwxyzjn closed 2 years ago

vwxyzjn commented 2 years ago

rnd_ppo.py is a bit dated, and I recommend refactoring it to match other PPO style, which would include:

Overall I suggest selecting ppo_atari.py and rnd_ppo.py and use Compare Selected on VSCode to see the file difference and minimize the file difference:

image

Types of changes

Checklist:

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments.

yooceii commented 2 years ago

image Finally got a finished run and looks close to their blog's result image

vwxyzjn commented 2 years ago

Oh wow, this is really nice! How long did the experiment take?

yooceii commented 2 years ago

Almost 11 days with envpool and 1080.

vwxyzjn commented 2 years ago

Oh wow that’s taking a really long time. I think given the insane amount of computing required, running it for three random seeds might not be necessary…

yooceii commented 2 years ago

Yeah, I also don't want to spend so much time running it lol.