koaning / memo

Decorators that logs stats.
https://koaning.github.io/memo/getting-started.html
MIT License
103 stars 9 forks source link

Add a parallel Runner #24

Closed bradday4 closed 3 years ago

bradday4 commented 3 years ago

Hey guys. I love the package it's very useful. I thought I would go ahead and submit this PR. I created a wrapper called mempar that uses joblib to parallelize whatever function you would normally wrap with memlist. Now you can just pass an iterable of dicts with keys that match the function signature and the function will execute in a parrallel menner. mempar also allows for the passage of all positional and keyword arguments to parallel_backend .

I also went ahead and created some unit tests based off those in test_memfile.py which are now in test_mempar. I also tested with the @memfile decorator and everything works so long as @mempar comes before @memfile. I didn't get around to writing a unit test for that specific condition yet.

I hope this adds some more utility to the package. Please let me know if there is anything else you want to know. I tried my best to match formatter, linter, doc style etc... since I didn't find a contribution guide.

-Brad

koaning commented 3 years ago

In the future, please create an issue before you write the PR. That way we could have discussed the API considerations before you wrote any code.

I certainly appreciate the idea of adding support for parallel computation (and the code looks neat from glancing at it!) but I'm not sure that decorators are the best way to offer this functionality. The decorators in memo so far all deal with logging/stats. You still input a single value to the function, the decorators just makes sure that the stats get logged. Adding a decorator that assumes that a setting now needs a list instead of single parameter feels like an anti-pattern and I'm wondering if there is another way of dealing with it.

I'm thinking out loud here, but what if we make a parallel grid? Or perhaps a runner object? This not only keeps the decorators around for logging, it also allows us to think of a way to support a progress bar for the parallel computation.

bradday4 commented 3 years ago

In the future, please create an issue before you write the PR. That way we could have discussed the API considerations before you wrote any code.

@koaning for sure, Next time I’ll create an issue.

As for parallelizing grid that was my first thought but as I was trying to code it up I realized that any parallelization would have to occur with the users function which is why I went with the decorator approach. It seemed to me as a rather elegant approach of abstracting away the parallelization process while giving the user a simple interface.

If grid were parallelized, then somehow it would need to consume the users function to do the parallel dispatching. From my perspective that does not seem as clean, but it is definitely doable. The major downside I see with that approach though is you then force the user who wants to parallelize their function to generate the iterable of key value pairs for that function with grid when they may have another method of creating those combinations. For instance, they are returned from another package.

I guess the third option would then be to leave grid alone and create a parallel dispatching function which is what I assume you were referring to as a runner object (please correct me if I’m wrong). Then the user would pass in the function to be executed in parallel, the grid , and some flag for indicating a progress bar.

Just an fyi I was able to get the progress bar working with the mempar decorator by monkey patching the BatchCompletionCallBack in joblibs backend.

koaning commented 3 years ago

I guess the third option would then be to leave grid alone and create a parallel dispatching function which is what I assume you were referring to as a runner object (please correct me if I’m wrong). Then the user would pass in the function to be executed in parallel, the grid , and some flag for indicating a progress bar.

Yes! This is what I meant by referring to the "runner". You're also right that it should not go into the grid function. Better to keep concerns separated.

In your mind, would the API be like this?

from memo import memlist, grid
from memo import Runner

@memlist 
def silly_example(setting_a, setting_b):
    return {"metric": setting_a + setting_b}

settings = grid(setting_a = range(100), setting_b = range(20), progbar=False)

runner = Runner(backend="threading", workers=2)
runner.run(func=silly_example, settings=settings)

Feel free to comment on this quick example, it's merely something to help in the discussion.

bradday4 commented 3 years ago

Yes I like that approach. Creating a Runner object would also help code extensibility if another backend was to be implemented at a future date. One thing though, how would constants either positional or keywords pass to the func silly_example in the Runner object? Say the silly_example function took in a third argument called constant_c. I have a few ideas that come to mind.

The first idea would be to add a method specific for adding arguments to the function being consumed. e.g runner.add_func_commands(constant_c, kw2... etc) before runner.run is called. I don't like this approach.

The second idea is to use the run method and add in the constants e.g runner.run(silly_example, settings=settings, constant_c=constant_c).

The third idea is to some how insert constant_c in the settings object but that would require updating grid to accommodate constant values. Question? I haven't tested this is it possible to send a scalar to grid? e.g grid(setting_a = range(100), constant_c=5)

The fourth would be to use functools.partial e.g partial_silly_example = functools.partial(silly_example, constant_c=5) then pass partial_silly_example to runner.run

My vote would be for the second example as it seems to be the simplest approach and probably the easiest for users to understand the implementation.

koaning commented 3 years ago

I think this should just go ahead and work.

from memo import memlist, grid
from memo import Runner

@memlist 
def silly_example(setting_a, setting_b):
    return {"metric": setting_a + setting_b}

# Note the `constant_c` here. It's a list with a single value.
settings = grid(setting_a = range(100), setting_b = range(20), constant_c=[1], progbar=False)

runner = Runner(backend="threading", workers=2)
runner.run(func=silly_example, settings=settings)

The library only supports keyword arguments because that's the only way that we can make sure that they're logged with an appropriate name.

bradday4 commented 3 years ago

I went ahead and pushed the Runner class and removed the mempar decorator. Also updated tests and few other odds and ends. The progress bar still works with the new implementation. Runner will give a warning to the user if they pass a generator since you can't know in advance how long it is and therefore the progress bar becomes moot. A processing indicator could always be used instead but I didn't implement that.

koaning commented 3 years ago

Also ... did you run a benchmark on your end? On my end it seems like only the threading option makes it go faster.

This example:

import numpy as np
from memo import memlist, grid, Runner

data = []

@memlist(data=data)
def birthday_experiment(class_size, n_sim=1000):
    """Simulates the birthday paradox. Vectorized = Fast!"""
    sims = np.random.randint(1, 365 + 1, (n_sim, class_size))
    sort_sims = np.sort(sims, axis=1)
    n_uniq = (sort_sims[:, 1:] != sort_sims[:, :-1]).sum(axis = 1) + 1
    return {"est_prob": np.mean(n_uniq != class_size)}

settings = list(grid(class_size=range(10, 50), n_sim=range(10000, 20000, 100), progbar=False))

These results:

No Runner

CPU times: user 59.5 s, sys: 2.37 s, total: 1min 1s
Wall time: 1min 2s

Threading

CPU times: user 1min 37s, sys: 11.8 s, total: 1min 49s
Wall time: 38.5 s

MultiProcessing

CPU times: user 1min 26s, sys: 6.89 s, total: 1min 33s
Wall time: 1min 29s

Loky

CPU times: user 1min 27s, sys: 6.94 s, total: 1min 34s
Wall time: 1min 30s
bradday4 commented 3 years ago

I'm now also contemplating to have the grid method return a list instead of a generator. This is something I'll keep for a seperate PR that I'll do after I merge this one in.

Could we open an issue about this. I had an idea / questions around parallelizing grid and the function call all together.

Also ... did you run a benchmark on your end? On my end it seems like only the threading option makes it go faster.

As for benchmarking I've tested threading against sequential and saw about a 2x speedup in time. Like I mentioned previously threading usually does the trick for me. My intuition is that as the number of elements in grid grows eventually loky, or multiprocessing which are process based (cpu-cores ) will begin to perform better, however since there is a lot of overhead in creating each worker that's where the slowdown occurs. We could raise another issue and try to benchmark this for users, but my feeling is that is something for the user to determine. I left a note about this issue in the Runner docstring and it could be mentioned in the package documentation.

koaning commented 3 years ago

@bradday4 for a discussion on the future of grid see this issue: https://github.com/koaning/memo/issues/21

bradday4 commented 3 years ago

The final thing I'm wondering about is if we want to make the joblib installation optional. It's perfectly fine as-is, but I am curious on your opinion.

My Initial thought is to make it part of the package. joblib is a popular package and the likelihood of it already being installed on a users system is pretty hi. I don't see a use case for trying to limit overall package size based on its use cases, but I could be wrong.

The only main comment is one towards documentation, but I'll gladly pick that up if it's easier.

Im ok with you banging out the documentation on this if you want. I'm interested in the framework though if you have any links / guides etc.. to share, that way next time I can just do it myself.

bradday4 commented 3 years ago

Question on alternative backends. I was messing with a separate branch locally to see if I could get ray working and it seems as though it is pretty trivial to do. Do you want to incorporate that into this PR or should I raise a new issue for it?

koaning commented 3 years ago
  1. Thanks for the PR! I just git merge.
  2. Let's make a new PR for ray. I would prefer that it's implemented as an optional dependency if possible though. I really like ray, but I can certainly imagine that not everybody will need it.