millanek / Dsuite

Fast calculation of Patterson's D (ABBA-BABA) and the f4-ratio statistics across many populations/species
160 stars 26 forks source link

non-significant D-statistic with significant clustering (homoplasy test) #87

Open svedwards opened 8 months ago

svedwards commented 8 months ago

Dear Milan -

Thank you for Dsuite, a very useful tool! I recently got a result with signficant clustering by the KS-homoplasy test but non-significant D-statistic. Is this expected? The bioRxiv preprint seems to suggest that this is not expected frequently. Thanks for any comments - results below. - Scott

P1 P2 P3 Dstatistic Z-score p-value f4-ratio clustering_KS_p-val1 clustering_KS_p-val2 BBAA ABBA BABA AI AW AC 0.0103669 0.510179 0.609926 0.0134129 2.3e-16 0.258379 13981.6 12896.3 12631.7

millanek commented 8 months ago

Dear Scott

I can think of a number of possible reasons: 1) Introgression went both from P3 -> P1 and P3 -> P2 in about equal proportions. Then D is zero but ABBA and BABA sites cluster. 2) No introgression but there is a substantial proportion of long non-recombining blocks (e.g., inversions) that segregate differentially due to selection. 3) Some types of mutation rate variation along the genome.

And, of course, any combination of the above.

All the best Milan

millanek commented 8 months ago

One more thought: I see that "clustering_KS_p-val2" is not significant in the example above. This is interesting, because "clustering_KS_p-val2" should be robust to mutation rate variation. At the same time, "clustering_KS_p-val2" has lower power; i.e., it is less likely to be significant in the cases of true introgression. But you have plenty of data (>10000 ABBA / BABA sites), so this is not a worry here.

Overall, the combination of non-significant D and non-significant "clustering_KS_p-val2" would make me think that some type of local mutation rate variation might be the cause for what you see in "clustering_KS_p-val1".

Milan