Closed juanenciso14 closed 11 months ago
I didn't consider the distinction between ABBA - BABA and BABA - ABBA important when writing the documentation, since the sign can easily be flipped by changing the order of the populations, but I think you're right in that it should read BABA - ABBA, and that is in fact what is being calculated.
f4 is identical to the numerator in the D statistic: f4(a, b; c, d) = P(BABA) - P(ABBA)
. The denominator of the D statistic, P(BABA) + P(ABBA)
, is always positive, so the sign of f4 and D should always be the same, for the same order of populations.
Thank you for clarifying!
But what does the result mean? A positive value of f4 means P(BABA) - P(ABBA) is positive, and there is excessive gene flow between pop2 and pop3, or pop1 and pop4? Sorry for this naive question- I have used admixtools but since there are quite many changes between version 1 and 2, it would be nice to verify if I m understanding it correctly.
thanks, Cui
I find it easiest to think of it as a correlation of allele frequency differences.
A positive value of f4(a, b; c, d)
and of D(a, b; c, d)
mean that there is a positive correlation of the allele frequency differences a-b
and c-d
. That means that a
and c
share some genetic drift with each other, relative to b
and d
.
A negative f4(a, b; c, d)
means that a
and d
share some genetic drift with each other, relative to b
and c
.
If f4(a, b; c, d)
is zero, then a
and b
form a clade relative to c
and d
.
Hi,
I am trying to figure out the right order to set up populations for calculating D statistics with the f4 function. The interpretation of these results will depend on how D is defined in the package. In the documentation of the function it suggests that it is doing something equivalent to ABBA - BABA on the numerator. However, in Patterson 2012 the numerator is estimated as BABA - ABBA. Some of our results suggest that the program can be doing BABA - ABBA, as opposed to what is found in the documentation of f4. Is this the case?
Thank you in advance!