nottombrown / rl-teacher

Code for Deep RL from Human Preferences [Christiano et al]. Plus a webapp for collecting human feedback
MIT License
556 stars 93 forks source link

Use multiple MPI workers with pposgd agent #15

Closed nottombrown closed 6 years ago

nottombrown commented 6 years ago

This may be difficult to do, as I'm not sure that multiple workers each doing learning will be easily compatible with our current predictor setup.

The predictor could use the same gradient averaging method that we use in PPO, but this would likely require a major refactor, and make parallel_trpo no longer work.

nottombrown commented 6 years ago

Closing this, as it seems like it would be nontrivial to do. I think that this would work best as a fork of the repo.