malariagen / malariagen-data-python

Analyse MalariaGEN data from Python
https://malariagen.github.io/malariagen-data-python/latest/
MIT License
13 stars 23 forks source link

Speed up plot_haplotype_clustering() #449

Closed alimanfoo closed 9 months ago

alimanfoo commented 9 months ago

The plot_haplotype_clustering() function is going pretty slow for larger numbers of samples. Because it involves a pairwise distance calculation it will scale roughly with the square of the number of samples, so some performance issues for larger numbers of samples is unavoidable. However, we might be able to improve the situation to some extent with a more efficient implementaiton.

Some thoughts: