millanek / Dsuite

Fast calculation of Patterson's D (ABBA-BABA) and the f4-ratio statistics across many populations/species
160 stars 26 forks source link

Dtrios do not calculate D and f-ratio for all possible trios #58

Closed RezaFahi closed 1 year ago

RezaFahi commented 2 years ago

I have a VCF file including 4 populations (one of that is an outgroup). I used Dsuite program (Dtrios option) for the introgression test. Based on Dsutie paper "The first program, Dtrios calculates the sums in Equation and outputs genome-wide statistics including the D, its associated p-value, and the f4-ratio statistic, for all trios of populations or species". Except for the outgroup, I have three populations and I expected the D statistics to be calculated for all 6 possible trios, while it was calculated for only one trio without any error or warning.

Can anyone help me find the problem?

bioshimmer commented 1 year ago

In the beginning I had the same problem as you. you can seach the answer in the close issues#35.

millanek commented 1 year ago

Dsuite outputs all combinations of populations, not all permutations. With three populations, you have one trio.

If, for some reason, you want all permutations of your three populations, you can specify the different arrangements using the --tree option. Still, there are only three arrangements meaningfully calculate: ((P1,P2),P3); ((P1,P3),P2); ((P2,P3),P1)

This is because the rest are just the mirror image: ((P1,P2),P3) is the mirror image of ((P2,P1),P3). The D statistics will be exactly the same, just the sign reversed (i.e. one positive and the other negative).

Milan

jiangzy26 commented 1 year ago

In the beginning I had the same problem as you. you can seach the answer in the close issues#35.

Hi,Could you share how you deal with it? Now I got P1=A, P2=B, P3=C, but I want a setting P1=B, P2=A, P3=C, and get Dstatistic and Z-score, could you share your solutions? Thanks!

bioshimmer commented 1 year ago

您好!您的邮件已收到!会尽快回复!祝工作顺利!

jiangzy26 commented 1 year ago

I thought, for example, P1=A, P2=B, P3=C, Dstatistic=0.01, Z=10, p=0, then based on the formula in the Dsuite paper, when P1=B, P2=A, P3=C, the Dstatistic should be -0.01, Z = D/std_err(D), the Z result should be -10, and P value should be 1-P_original=1-0=1. Thanks