In benchmarking haplotype clustering with larger numbers of haplotypes I've found situations where it can take up to a minute to save the results of pairwise distance computation to the results cache. This seems to be entirely due to slow performance of numpy savez_compressed(). Using zarr save() instead, which is a drop-in replacement, runs much faster at around 1s.
In benchmarking haplotype clustering with larger numbers of haplotypes I've found situations where it can take up to a minute to save the results of pairwise distance computation to the results cache. This seems to be entirely due to slow performance of numpy
savez_compressed()
. Using zarrsave()
instead, which is a drop-in replacement, runs much faster at around 1s.