Remove the unnecessary regular advantage code in PPO

vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)

http://docs.cleanrl.dev

Other

5.54k stars 631 forks source link

Remove the unnecessary regular advantage code in PPO #287

Closed bragajj closed 2 years ago

bragajj commented 2 years ago

Description

Resolving issue #207 Unnnecessary ppo code removed, numerical accuracy was ensured by team members through debugger. Additional runs showing performance without the extra code can be found at the following wandb link: https://wandb.ai/bragajj/ppo_advcalc

Types of changes

[x] Bug fix
[ ] New feature
[ ] New algorithm
[ ] Documentation

Checklist:

[x] I've read the CONTRIBUTION guide (required).
[x] I have ensured pre-commit run --all-files passes (required).

vercel[bot] commented 2 years ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated
cleanrl	✅ Ready (Inspect)	Visit Preview	Oct 4, 2022 at 0:52AM (UTC)

vwxyzjn commented 2 years ago

Thanks @bragajj, it looks good. I would also remove the gae flag.

https://github.com/vwxyzjn/cleanrl/blob/49168b87ccfa68573046c1dbd0651361b6c486dd/cleanrl/ppo_atari.py#L58-L59

The --gae flag exists in other scripts as well. If you could do the same for them, that would be great!

bragajj commented 2 years ago

GAE flags removed from all ppo files, isaac gym and ppo_rnd_envpool.py are also now updated to reflect GAE revisions