rlworkgroup / garage

A toolkit for reproducible reinforcement learning research.
MIT License
1.88k stars 310 forks source link

[WiP] Reproducible on- and off-policy sampling #2185

Open MkuuWaUjinga opened 3 years ago

MkuuWaUjinga commented 3 years ago

Extend the Environment API to support setting environment library specific seeds.

Tasks:

Open Questions:

MkuuWaUjinga commented 3 years ago

Thanks for the pointers. Addressed everything in the latest commits. I assume GridWorld and PointEnv don't have any seeds at all? Furthermore, with the implementation right now every worker has the same environment seed. This means that each worker always samples the same trajectory given a fixed action sequence. I think this is something we need to fix before merging?