millanek / Dsuite

Fast calculation of Patterson's D (ABBA-BABA) and the f4-ratio statistics across many populations/species
162 stars 25 forks source link

unlinked SNPs from target enrichment data for Suite #49

Open ambed0ya opened 2 years ago

ambed0ya commented 2 years ago

Hello,

I am using D-suite with SNPs extracted form a target enrichment data set. Have run the analysis using both the full SNPs (~13,000 SNPs) and "unlinked SNPs" (thinned every 100 bp; 1,041 SNPs). From the D-suite publication I gather that for the relatively small number of unlinked SNPs, the proportion of cases where the strongest inferred f-branch signal corresponds the correct simulated gene flow recipient and donor branches is low but using the full sNP dataset would violate the assumption of independence of the loci. I get very different results with both datasets after running D-trios and Fbranch and would like to know your opinion on weather these analyses should be conducted at all with these type of data, and which dataset would be best.

I was also wondering if in that case, it would be best to not infer which branches and involved in geneflow, but instead, show how after the BH correction, there are multiple p-values that look significant. I am working on study investigating ILS and introgression as a source of gene-tree conflict in an recent Andean radiation.

Thank you!!