mmatschiner / tutorials

Tutorials on phylogenetic and phylogenomic inference
354 stars 165 forks source link

The parameters in the result file #7

Closed serene66 closed 3 years ago

serene66 commented 4 years ago

Hi , I have got the results through this method, and I have a few questions to ask you about the result files. From the file of samples0__BBAA.txt, I got some effective information. Examples are as follows: Dstatistic Z-score p-value f_G BBAA ABBA BABA 0.697219 24.0067 0 0.367154 73672 46832 8354.75 0.68946 22.774 0 0.356332 73300 45948.2 8445.75 0.68315 22.9524 0 0.344278 72624 44039 8290.25 0.0546957 6.01339 9.08E-10 0.0118636 97982.1 13739.1 12314.1 0.0438579 3.87222 5.39E-05 0.00957043 101836 14063.4 12881.6

  1. What does the Z-score and f_G tell us in the resulting file? I couldn't find any explanation for them in the results analysis.
  2. Is there a limit to how far the gene flow can be judged by a D-statistic? Or is there a gene flow as long as D-statistic is greater than 0? Can I derive any potential information about the direction of gene flow from the resulting data?
  3. The p-value based on jackknifing for the null hypothesis of D = 0, when there is an obvious gene flow signal (i.e. D-statistic close to 1), the p-value will be equal to 0? Thank you very much. I look forward to your reply.
mmatschiner commented 4 years ago

Hi Serene,

  1. For the Z-score, have a look at the manuscript on Dsuite: https://www.biorxiv.org/content/10.1101/634477v2.full The p-values are calculated from the Z-scores. The f_G refers to Green et al.: https://science.sciencemag.org/content/328/5979/710
  2. The D-statistic has a couple of assumptions that are met more likely in recently diverged groups and therefore be less reliable as a measure of introgression between lineages that diverged many millions of years ago. The assumptions include clock-like evolution and the absence of homoplasies. Regarding the latter, this paper is useful: https://www.pnas.org/content/115/50/12787
  3. This is correct.
serene66 commented 4 years ago

Hi Michael Matschiner, Thank you for your providing the reference. I will read them carefully. Hybridization produces new species, and introgression allows genetic information to remain in the lineage. Gene flow, in the form of hybridization and introgression, allows us to capture hybridization information. I have conducted hybridization detection through Phylonet before, and I already know that there are ancient hybridization events in this lineage. So I want to corroborate the ancient hybridization events with the existing interspecific gene flow signals. According to your opinion, the closer the D-statistic is to 1, the higher the probability of recent hybridization is. If it is close to 0, does it mean that there may be no recent hybridization or unreliable ancient hybridization signal? Can I infer the gene flow from the ancient hybridization by combining the results of Phylonet with the weak D-statistic (0.1) information? Are gene flow signals from 30-10Ma hybridization reliable? Thanks!

mmatschiner commented 4 years ago

The D statistic will rarely ever be 0 (ILS would have to be absent), but in principle, yes, the higher D, the more probably there was introgression. The D doesn't tell you anything about when the introgression occurred, however. Anyway, use the p-values for hypothesis testing. You can argue that both your Phylonet results and the D-statistic support past introgression. Whether or not signals are reliable in a group of 10-30 Myr is difficult to say without considering the nature and strength of the signals as well as the evidence for clock-rate variation and the mutation rate.