Speed up plot_haplotype_clustering()

The plot_haplotype_clustering() function is going pretty slow for larger numbers of samples. Because it involves a pairwise distance calculation it will scale roughly with the square of the number of samples, so some performance issues for larger numbers of samples is unavoidable. However, we might be able to improve the situation to some extent with a more efficient implementaiton.

Some thoughts:

Noticing in particular that it looks like the pairwise distance calculation may be performed twice unnecessarily, once to find the max and min distances, and again to then compute the dendrogram. Can we avoid this?
Any way to use a faster pairwise distance implementation? scikit-learn has an implementation which has thread parallelism.
Would it help to copy haplotype data into fortran order first? Memory access can make a decent difference for pairwise distance calculations.
Caching of the pairwise distance calculation would allow then faster replotting with different parameters like color, symbol or linkage_method.
Find distinct haplotypes first, then only compute pairwise distance (and possibly clustering) with distinct haplotypes.

malariagen / malariagen-data-python

Speed up plot_haplotype_clustering() #449