GPU-boosted implementation of PhenoGraph

eburling commented 4 years ago

I'm writing to share my GPU-boosted implementation of PhenoGraph. Instead of using the CPU-bound libraries numpy, scipy.sparse, and sklearn as in the legacy implementation, I use the GPU-bound libraries cupy, cupyx.sparse, and cudf/cuml from NVIDIA's RAPIDS library to reduce execution time by orders of magnitude for large datasets. For especially large datasets or dataset compilations (~3 million cells x 50 features), the kNN search can be distributed to multiple GPUs, if they are available. For a synthetic dataset of 1 million cells x 30 features, the CPU implementation executes in ~6 hours, whereas the GPU implementation run on a single V100 GPU executes in ~40 seconds (~500-fold speed-up):

Modularity is comparable between GPU and CPU implementations:

Please feel free to link to the repo if interested: https://gitlab.com/eburling/grapheno

Thanks and sorry for the spam! I hope the community finds it useful.

pankajkgupta commented 4 years ago

Hey, thanks for sharing. I dont seem to be able to find this package at the URL you shared. Is it still active?

eburling commented 4 years ago

Hi @pankajkgupta. Whoops, I just had to change some repo preferences. Thanks for calling it to my attention. It should work now. Let me know if not.

pankajkgupta commented 4 years ago

Hey, yes I saw the package but now it seems the page is no longer there. Gives 404 error.

eburling commented 4 years ago

Sorry, I was experimenting the the accessibility settings yesterday. Should work now.

LouisFaure commented 4 years ago

This is great! Would it be possible to include leiden community detection similar to the fork from dpeerlab (Which is actually the currently maintained version)?

Or if you make it to a github repo, I can also propose a pull request myself

eburling commented 4 years ago

Hi @LouisFaure. Yes, I can add 'leiden'. I see that it was added in cuGraph 0.15.

jacoblevine / PhenoGraph

GPU-boosted implementation of PhenoGraph #22