Solution to parallel TRPO

kvfrans commented 7 years ago

Here's an implementation of parallel TRPO as well as a paper describing the multiple actors setup. The current speedup is a 3x decrease from single-threaded when using a 4-core setup.

tlbtlbtlb commented 7 years ago

Nice results!

I wish the writeup talked more about the parallel implementation and its limits, since that's the new contribution. From the code I can see it's using Python's multiprocessing, so it's limited to the cores on one instance. How would you scale it up to use 100 or 1000 cores?

0bserver07 commented 7 years ago

@tlbtlbtlb @kvfrans I'd suggest something among the lines of Celery it's a distributed task queue, deals with all the configuration when scaling up. (Utilizing local CPU) or switching to external workers.

kvfrans commented 7 years ago

@tlbtlbtlb The code should work with many more cores than I'm using right now (only 4 on my computer). I haven't gone anywhere with a multiple computer setup yet. The rollouts code is separate, so you could run a rollout instance on a bunch of computers and have the learner send out network requests to those instances for experience.

tlbtlbtlb commented 7 years ago

Right, I'd expect the code to scale, but it's not clear how far before it hits a limit. I think the paper would be stronger if you do the experiment to find out what the limit is. For instance, this paper shows increasing learning speed (on an algorithm very different from TRPO) up to 16 cores: https://arxiv.org/pdf/1602.01783v2.pdf.

kvfrans commented 7 years ago

@tlbtlbtlb Yeah, I'd love to do some more experiments to show how far parallelizing actors will scale. Do you happen to have access to any sort of computer with many cores that could be used for testing?

tlbtlbtlb commented 7 years ago

I use Amazon AWS. You can rent an hour on a 64-core machine for $3.83.

jachiam commented 7 years ago

@kvfrans Congrats on implementing a successful parallel actor setup for TRPO! Your paper is also very nicely written, well-composed, and scientific. As a reference, though, rllab already implements parallel actors for TRPO and other batch policy optimization algorithms.

The key challenge in parallelizing TRPO further lies in the bottleneck at the update step itself. This is needed because, while the TRPO update step is computationally quite cheap for small neural network policies and small amounts of data, like the ones needed to solve the MuJoCo tasks, it does not scale gracefully to larger neural networks or larger batches of data.

It's difficult to parallelize the update step because it involves approximately solving a system of linear equations for the search direction, where the coefficients are sample estimates of expectation values. (That is, it solves Ax=g for x, where A and g are sample estimates of the true FIM and gradient.) It is unclear how to solve this efficiently when data is split between nodes.

I've given some thought to this recently. Please feel free to contact me if you'd like to chat about this in more depth!

P.S. It's easy to compute a theoretical maximum speedup for the parallel actor setup using Amdahl's law.

ilyasu123 commented 7 years ago

Looks good, thanks for the submission!

openai / requests-for-research

Solution to parallel TRPO #22