nottombrown / rl-teacher

Code for Deep RL from Human Preferences [Christiano et al]. Plus a webapp for collecting human feedback
MIT License
556 stars 93 forks source link

Back to pooled rollouts, but this time with random seed set using worker index. #28

Closed Raelifin closed 6 years ago

Raelifin commented 6 years ago

Yes, this is a bit more complex, but the complexity is encapsulated well. Also, even though rollout collection isn't the time bottleneck for running Teacher, it is a significant component when doing rapid dev work.

nottombrown commented 6 years ago

Tested on Mujoco - LGTM