scikit-hep / pyhf

pure-Python HistFactory implementation with tensors and autodiff
https://pyhf.readthedocs.io/
Apache License 2.0
283 stars 83 forks source link

Parallelism of calculations in pyhf ala joblib (or similar) #807

Open kratsg opened 4 years ago

kratsg commented 4 years ago

Description

There are starting to be locations in pyhf where one can start parallelizing certain calculations on behalf of the user (rather than the user explicitly parallelizing). For example, one that will come up is with the toy calculation added in #790 where we need to do a for-loop and calculate the test statistic for each toy.

This cannot be batched or vectorized quite simply because a statistical fit is performed for each toy (and num iterations is not necessarily the same for each toy). There may be other good examples in the code-base in the future that we will want the parallelism.

Is your feature request related to a problem? Please describe.

No.

Describe the solution you'd like

Perhaps something like pip install pyhf[toytools] or pyhf[toys-joblib] or pyhf[toys-dask].

Describe alternatives you've considered

Dunno. I didn't think hard enough yet.

Relevant Issues and Pull Requests

Additional context

Nope.

kanishk16 commented 4 years ago

I believe dask would be a nice option due to a variety of reasons, primarily scaling up of data in future but before making an opinion I wanted to know about any personal experiences of limitations of dask over joblib.

matthewfeickert commented 3 years ago

@kratsg For those who don't know, like me, what's the advantage of using concurrent.futures, as is currently done in the draft of PR #1158, over just using joblib (beyond concurrent.futures being built into the language)?

So replacing

https://github.com/scikit-hep/pyhf/blob/02b195158d2e3fe25aec17f72ef3c28fd2af176d/src/pyhf/infer/calculators.py#L723-L734

with something like

from joblib import Parallel, delayed

...

        # n_jobs is set as kwarg
        signal_teststat = Parallel(n_jobs=n_jobs)(
            delayed(teststat_func)(
                poi_test,
                sample,
                self.pdf,
                self.init_pars,
                self.par_bounds,
                self.fixed_params,
            )
            for sample in tqdm.tqdm(signal_sample, **tqdm_options, desc='Signal-like')
        )

(and corresponding code for bkg_teststat) with the default "loky" backend I was seeing rates of over 500 toys/second on branches that have PR #1610 implimented.