malariagen / malariagen-data-python

Analyse MalariaGEN data from Python
https://malariagen.github.io/malariagen-data-python/latest/
MIT License
13 stars 23 forks source link

Consider adding GPU implementation of haplotype pairwise distance #460

Open alimanfoo opened 9 months ago

alimanfoo commented 9 months ago

The pylibraft library has implementations of pairwise distance which run on GPU. Trying this on colab gives 45s on CPU versus <1s on GPU for ~6000 haplotypes. GPUs are free on colab, so this would be relatively accessible to most users, and would make analysing large numbers of haplotypes very amenable.

Installing pylibraft requires a slightly different pip command, and so this probably should be an optional dependancy. I.e., user has to manually install pylibraft, and then if detected it can be used, but fall back to existing CPU implementation.