lihaoyi / macropy

Macros in Python: quasiquotes, case classes, LINQ and more!
3.28k stars 178 forks source link

Macro Powered Fork-Join Concurrency #25

Closed lihaoyi closed 11 years ago

lihaoyi commented 11 years ago
with fork as x:
    ... do some expensive operations ...
    result
with fork as y:
    ... do more expensive operations ...
    result

res_x, res_y = join(x, y)
do_stuff(res_x, res_y)

Although it is possible to grab the bytecode of a function and send it to a separate process to be run in parallel, macros would let us parallelize tasks on a sub-function level. Exactly how they are run (process pool, thread pool, remote machines) is completely abstracted away. Another syntax that looks nicer but only works for expressions may be:

# these two get run in parallel
x, y = fork_join%(
    ...some expensive computation...,
    ...another expensive computation...
)
do_stuff(x, y)

Macros could also give us a concise way of emulating "lexical scoping", by delimiting which parts of the forked code are to be evaluated locally:

with forked as x:
    temp1 = do_expensive_op_with(u%value1)
    temp2 = do_expensive_op_with(u%value2)
    ...
    result

In this case, the u% syntax indicates that value1 and value2 are to be evaluated in the local scope and their value sent over to the forked task, rather than sending over the names value1 or value2 which may not exist in the scope where the forked task is executing. There are probably a bunch of other clever things you could do with macros, along with a bunch of pitfalls to do with concurrency/parallelism, neither of which has been thought through.

eigenhombre commented 11 years ago

This issue reminds me of @mrocklin 's question on StackOverflow: http://stackoverflow.com/questions/12415316/dead-simple-fork-join-concurrency-in-clojure

which is similar to your second example. What use cases aren't covered by this variant?

This is an interesting example which may help show some of the edges of what macropy can and cannot do.

lihaoyi commented 11 years ago

I'm not that familiar with clojure, but if that is what I think it is, it's basically exactly what I'm proposing here, except...

The second bit is important; on the JVM, you can simply close over lexical scope and pass the closure to other threads to munch on before returning the result. On Python, because of the GIL, that's not possible, so we'll have to multiprocess. That would mean we'd probably want to explicitly delimit which parts are closed over (the u%s above), and will need to be pickled-transmitted-unpickled, and which parts are taken from the destination process' scope (everything else).

Other than that, it's basically the same idea. You can do it in scala, you can do it in clojure, i'd love to be able to do it in python.

eigenhombre commented 11 years ago

Ah, there are actually two cases of interest:

  1. "real" fork/join where you actually fork subprocesses;
  2. threaded concurrency where you wrap the separate computations in a macro for convenience. That's what the SO question was discussing. Though the GIL makes threads inefficient for certain cases, there is certainly still utility in using threads.

Example macros handling both multiprocessing (forks) and threading would be of interest, I'd expect.

lihaoyi commented 11 years ago

I didn't actually consider "actual" forking be a real possibility; I probably should have made that clear =D. I meant task/fiber/goroutine/etc. style forking, implemented using erlang-style M:N scheduling to run on a thread-pool or process-pool. OS threads and processes are waaaaay to heavyweight to go around forking them!

I'm still doubtful of the utility of multithreading with the GIL; most of the cases where the GIL isn't a problem (e.g. blocking web requests) are probably better handled with asynchrony, since there's no sense keeping a bunch of threads block if you can help it.

Of course, all that stuff may be a premature optimization, since if we want maximum CPU utilization I probably wouldn't be using python anyway

eigenhombre commented 11 years ago

Multithreading in Python can be very useful and leads to quite simple code in certain circumstances -- much simpler than event-driven / async, IMO. The most obvious example is where you have multiple things which have to happen concurrently but which depend very little on each other, or not at all - for example, when you need to finish some computation that is needed "later", but the main program decides when to pick it up (rather than handling a "done" event). Clojure's futures are an expression of this common idiom. The really bad case with the GIL is where one thread is CPU-bound and one I/O bound -- a common enough case, but by no means the only relevant one.

Processes are indeed heavyweight; threads are super-lightweight in comparison, by about a factor of 100 (see http://eigenhombre.com/2013/04/19/processes-vs-threads-for-testing/, unfortunately down at the moment due to GitHub problems with GitHub Pages).

In any case, if I get some time I might try to take a stab at writing a macro similar to the one in the SO question under discussion, if only to learn the limits of what MacroPy can do -- great work, by the way!!!

lihaoyi commented 11 years ago

I meant async in the implementation, not the interface! e.g. instead of forking 10 threads, send 10 tasks off onto our thread pool and wait for them to come back. Tasks would be 100 times lighter than threads which are 100 times lighter thanprocesses; Ideally we wouldn't forking any kernel resources and all this would be user-land scheduling.

I think that for shared-memory thread-pool fork-join you don't need macros, all you need are a nice syntax for thunks (which you can easily send to other threads because they all live in the same memory space); for basic multithreaded-evaluation-of-expressions, a normal function which you pass f% quick lambdas should be sufficient.

Don't let me discourage you from giving it a shot though; it would be great if someone other than the two of us could give macropy a spin and tell us what you think =)

lihaoyi commented 11 years ago

I'm going to close this issue; after two weeks of inactivity, I don't expect someone to step and and present a solution any time soon, and I personally aren't that excited in it anymore (the multiprocessing module is pretty good!) so probably won't spend the time to take a shot at it. Feel free to give it a shot anyway, but I'd like to clean up the issue tracker =)