Closed vwxyzjn closed 8 months ago
The preliminary proof-of-concept is really encouraging. Few lines of change in #338 result in ~7x improvement on overall training speed. Note that this proof-of-concept does not include reward normalization, which could be a bottleneck in ppo_procgen.py. Further investigation is warranted.
Folks can just use https://github.com/vwxyzjn/cleanba
Problem Description
Given the EnvPool==0.8.0 release by @YukunJ, @LeoGuo98, @Trinkle23897 (https://github.com/sail-sg/envpool/pull/197), we can go ahead and deprecate
ppo_procgen.py
in favor of #338, which should also work with procgen but gives us the benefit of JAX, EnvPool's Async API, and a more concise codebase.