malariagen / malariagen-data-python

Analyse MalariaGEN data from Python
https://malariagen.github.io/malariagen-data-python/latest/
MIT License
13 stars 23 forks source link

Arc diagram to visualise haplotype sharing between cohorts #457

Open alimanfoo opened 9 months ago

alimanfoo commented 9 months ago

Visualising haplotype sharing between different cohorts to infer adaptive gene flow between places or taxa is tricky, especially for large numbers of haplotypes.

Using a dendrogram via plot_haplotype_clustering() is doable and gives you a complete view of the haplotype structure, but it's hard to see what's happening when the number of haplotypes goes above ~1000.

Using a network via plot_haplotype_network() is better, but I'm not sure how well this scales yet to larger numbers of haplotypes.

A possible alternative would be to show something simpler, along the lines of the number of haplotypes shared between different predefined cohorts. A possible way to visualise this would be some kind of arc diagram, analogous to something like:

image

alimanfoo commented 9 months ago

Also this could be efficient as we just need to compute identical haplotype sharing, no need to do the full pairwise distance calculation. So this can be done via hashing of haplotypes in roughly O(n).