Closed Raelifin closed 6 years ago
Oh, it's also now compatible with ATARI and other environments with discrete actions.
This looks like a solid refactor!
Have you tested to make sure that sampling random rollouts from a different policy doesn't degrade performance much? As a sanity check, I'd be interested in seeing before and after learning curves on 700 label hopper.
Boom. LGTM
Much, much faster random rollout collection. Scales perfectly with more CPU cores. Doesn't rely on parallel-trpo. Simpler logic; can see environment loop, doesn't use tensorflow, doesn't use custom exceptions.