Infering divergence estimation

gphocs-dev / G-PhoCS

G-PhoCS is a software package for inferring ancestral population sizes, population divergence times, and migration rates from individual genome sequences.

33 stars 4 forks source link

Infering divergence estimation #81

Closed kmanoharan01 closed 3 years ago

kmanoharan01 commented 3 years ago

Hi,

I am new to using G-Phocs, I am trying to understand how to infer divergence time in the example folder mcmc.log file. Could you please help?

Does the example parameters work for humans?

Thanks, Mano

igronau commented 3 years ago

I'm not sure I understand your question. The sample control file can be used as a default version for any analysis. It's always good to perturb the parameters to make sure that the inferred values are robust. The output of the program is a trace file, which specifies sampled values for all model parameters. You can use these sampled values to characterize the posterior distribution of that parameter (mean value + credible intervals). The manual should contain some useful tips on that.

kmanoharan01 commented 3 years ago

Thanks, very much, for your reply.

I am new to trace files as well so having little hard time understanding the outputs.

May I know what are important "input" parameter for humans samples?

Can we use G-Phocs for just two population divergence time? or does it require three populations?

You can use these sampled values to characterise the posterior distribution of that parameter (mean value + credible intervals).

Here, what does the mean value represents?

For example, does tau_AB mean represents likely divergence time?

Thanks, Mano

igronau commented 3 years ago

G-PhoCS, like any other Bayesian method, produces a distribution for every parameter of interest. This distribution reflects what the data tells you about that parameter (taking into account also the prior distribution). As with any distribution, the mean and median given you a single number that summaries the distribution. So if you need to specify one number - that will be it. However, you typically will want to convey also the uncertainty in that value. For this, it's recommended to use the Bayesian 95% CI. In Tracer you get this via the 95% HPD interval. In the summary you show in the snapshot, this interval goes all the way to 0, so I'm not sure how informative it is.

kmanoharan01 commented 3 years ago

Thank you very for your patients and explaining.

Here is an example of C (African, A (European) and B (HanChinese) split times (tauAB_1 and tauAB_2).

Please correct me if I am wrong, In this example tauAB_1 95% HPD value represents (0.35) which suggest split time for European and Hanchinese. Similarly for African vs Eurasian split time (0.56) with 95% confidence.

Is this correct?

Kind regards, Mano

igronau commented 3 years ago

The values that you specify represent the upper bound on the corresponding parameters (tau_AB_1 and tau_AB_2). However, the lower bounds are both very low (~0.02). So, this is not a very informative inference. You need to examine the traces of the parameter values to see whether they appear to converge, or whether these parameters truly oscillate in this very large range.

kmanoharan01 commented 3 years ago

Thank you.