scikit-hep / pyhf

pure-Python HistFactory implementation with tensors and autodiff
https://pyhf.readthedocs.io/
Apache License 2.0
284 stars 84 forks source link

CuPy tensor backend #238

Open lukasheinrich opened 6 years ago

lukasheinrich commented 6 years ago

Description

following the discussion in #231 it should be quite easy to get a CuPy backend into pyhf

https://cupy.chainer.org/

it's advertised as a drop-in replacement for numpy. This will give us GPU acceleration (like TF and PyTorch, but separate from autodifferentiation -- which might be nice to have)

@matthewfeickert has some experience in writing these backends and iirc access to a CUDA enabled machine.

matthewfeickert commented 6 years ago

Sweet! That looks really cool. I'll poke at their docs over the weekend and then we can see if it is any more work to add in then normal.

lukasheinrich commented 6 years ago

i'd be especially interested in whether we can use the normal scipy based optmization with that. or if we also have to code one ourselves

matthewfeickert commented 6 years ago

This is looking promising, given this section of the docs:

How to write CPU/GPU agnostic code

lukasheinrich commented 6 years ago

maybe relevant https://github.com/cupy/cupy/issues/1196

matthewfeickert commented 6 years ago

Ah, so I'll need to set things up at my cluster to actually do this, as the requirements to even install include CUDA. So this isn't just CUDA enabled, but actually built using CUDA (this actually sounds like a good thing to me).

matthewfeickert commented 6 years ago

As an update, I'm still working on this, but I'm waiting to hear back on the HPC admins on my cluster. For technical reasons they might make me submit a list of all software that I want and they will setup the testing environment for me. This could be nice, but might result in a few days delay.

matthewfeickert commented 6 years ago

Something else that I should look into is ClPy: OpenCL backend for CuPy, as we're going to want to be able to have CuPy tests run in CI and Travis doesn't support CUDA infrastructure.

lukasheinrich commented 6 years ago

once we get a installation of CuPy it would be very interesting to get a feel of what the hardware speed up looks like. Even before implementing the backend we could run this notebook

BetterKitchenSink.ipynb.zip

probably only the tb = np lines in cell [2] need to be adapted if I understood the link

https://docs-cupy.chainer.org/en/stable/tutorial/basic.html#how-to-write-cpu-gpu-agnostic-code

correctly. right now we get a factor 400 comparing a naive implementation with the new vectorized one (from #251 )

numpy: 127 µs ± 12.2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
naive: 56.1 ms ± 4.55 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
matthewfeickert commented 6 years ago

Additional overview material: Shohei Hido - CuPy: A NumPy-compatible Library for GPU, PyCon 2018

matthewfeickert commented 5 years ago

More additional material: ContinuumIO's Numba and CuPy tutorial