related-sciences / gwas-analysis

GWAS data analysis experiments
Apache License 2.0
24 stars 6 forks source link

Try CuPy integration w/ Dask to see what, if any, operations benefit from GPU acceleration #6

Open eric-czech opened 4 years ago

hammer commented 4 years ago

I was checking in on Vaex recently and saw https://www.kaggle.com/jovanveljanoski/vaex-on-kaggle-gpu-performance-test, where they use jit_cuda from Vaex which uses CuPy behind the scenes.

We probably want to work with Dask arrays and CuPy directly rather than via Vaex, but just thought I'd point it out as an easy way to try CuPy.

eric-czech commented 4 years ago

I tried swapping out numpy for CuPy arrays as Dask chunks in qc_call_rate_benchmarking_cuda.ipynb, but the results were not great. What takes about 30 seconds in the original notebook, as a parallel CPU implementation, takes more like a minute w/ CuPy-backed dask arrays. The time varies quite a bit based on chunk size, but about 100% slower was as fast as I could get it.

On the other hand, using numba cuda.jit to do stuff not even possible w/ CuPy looks to be a win for LD prune (https://github.com/related-sciences/gwas-analysis/issues/26) so it's looking like for equal $ spent on GPUs and CPUs, GPUs will only make sense for pairwise algorithms (or worse). They're pretty rough benchmarks here though so it's definitely worth testing simpler things with CuPy more as the example workflows pile up.

hammer commented 4 years ago

That's an interesting finding that some workloads are best on CPU and some are best on GPU. It makes transparent dispatch to different backends more valuable.