uqrmaie1 / admixtools

https://uqrmaie1.github.io/admixtools
71 stars 14 forks source link

Zero drift edges #61

Closed santiago1234 closed 5 months ago

santiago1234 commented 7 months ago

My question is related to this previous one. I'm using the find_graphs function with data on Mexican populations. After running the find_graph function, all the edges in the resulting graphs show a drift length of 0. I'm trying to understand if this outcome is normal or indicative of a problem with the model fitting or data.

thanks

uqrmaie1 commented 5 months ago

The drift length of an edge is the inferred f2 distance of the two populations it connects. If you multiply the array of f2-statistics that you supply to find_graphs() by a factor, then the drift lengths of the resulting graphs should be multiplied by the same factor.

The plotting functions multiply the original drift lengths by 1000 and then round them, which results in small integer numbers greater than 0 when the inferred f2-statistics are larger than 0.001, which is usually the case.

If the allele frequencies are very similar across populations on average (either because the populations are closely related, or because you use sequencing data where most sites are do not vary between samples), then the inferred f2-statistics can be smaller than 0.001, and will be rounded to 0 in the plots.

So I would check how large the input f2-statistics are, and if they appear small, multiply them by some factor greater than 1. Or you could multiply the drift lengths by a factor before plotting.

If the drift lengths are exactly 0 (not just in the plot, but in the results data frame when printing enough digits), then there might be something else going on. In that case, it might be helpful if you could share your data and code with me.

santiago1234 commented 5 months ago

Thank @uqrmaie1 for the helpful explanation. My results aren't exactly zero, likely due to using sequencing data and the populations being closely related.