gphocs-dev / G-PhoCS

G-PhoCS is a software package for inferring ancestral population sizes, population divergence times, and migration rates from individual genome sequences.
33 stars 4 forks source link

How to interpret Total Migration Rate, exactly? #78

Open ericopolo opened 3 years ago

ericopolo commented 3 years ago

Hello,

I am struggling a little to decide whether or not there was a particular migration event, based on the total migration rate (mtot). In the original paper, as well as in other papers in which its authors participate, we read that mtot can be interpreted as the probability of the existence of at least one migration event through that band.

One of the problems I see with this interpretation is that values ​​above 1 are possible for mtot. In Freedman et al (2014) we see that a 1.24 mtot implies a "near to 100%" probability of the occurrence of migration by that band. Exactly what calculation is being done, there?

My second problem is about making a decision on whether or not there has been migration, from a statistical point of view. Is it possible to view mtot analogously to a p-value? If so, what would be the most reasonable null hypothesis here?

I wasn't sure if this would be the best place to ask this question, but since I didn't find any work that specifically addressed these issues, I figured that those could be other people's doubts, too.

Cheers,

Érico.

igronau commented 3 years ago

First, let me clarify the issue of the total migration rate (Mtot). It's a cumulative rate, so it's positive and not restricted to the range [0,1]. The probability of migration is actually obtained by the transformation: Pmig = 1-exp{-Mtot}. For small values of Mtot (<0.1), you get that Pmig and Mtot are very close. In later works, we make sure to apply this transformation to produce a more interpretable measure.

Regarding the issue of significant migration, we don't presently have a rigorous statistical test for this. What I typically use as an indication of significant migration is to see whether the 95% Bayesian CI's overlap with 0. Typically, when this happens, the mean estimate of Mtot is low (<2%). If the mean estimate is high and the 95% CIs overlap 0, we often report this as possible migration (with low confidence). Another thing I find useful is to avoid making the distinction between significant / insignificant migration bands, and simply treat bands based on some clear criterion on their posterior estimates of Mtot. For instance, if the entire posterior distribution is below 1% and overlaps 0, I will not report it in the main figure, but will report the rates in a supplementary table.

ericopolo commented 3 years ago

Dear Dr. Gronau,

Thank you, that was very, very helpful! About the criteria involving 95% CI overlapping 0, I have seen that being used on many papers, indeed, but I'm also a little puzzled with that. At least on my data, I very rarely see actual zeros in Mtot estimations, and therefore the lower bound of the 95CI is often some value very close to zero, but never really zero. Maybe that criteria must assume that values below some threshold are considered zero?

Érico.

igronau commented 3 years ago

You're probably right that the rate doesn't reach 0, since this is the edge of the feasible space for this parameter, and the MCMC sampler typically doesn't sample values at the edge. However, you can set some arbitrary small value (e.g. 0.0001) as a proxy for Mtot=0.

ericopolo commented 3 years ago

I think I may have found a slightly more objective solution, and you can certainly help me decide if it really makes sense. I generated random values ​​from a gamma distribution with the same parameters used for the migration prior in my analysis, and for the posterior distribution of each estimated migration band I did a nonparametric test (Wilcoxon) in which the alternative hypothesis would be that that the mean found in the posterior is greater than that of the prior.

I understand that the arbitrariness, here, is transferred to the choice of the prior, since priors with high averages will imply more rigorous tests, while small averages will facilitate that the estimated migrations are considered valid. But as the data will make a major contribution to later distribution, I believe that this is a strategy with greater objectivity than simply choosing a threshold for confidence intervals, if we assume we are setting reasonable priors. Furthermore, in the data I am working on, at least, this criterion selected (as well as discarded) migration bands in a way that made sense from a biogeographic point of view and that were compatible with other analyzes I am running and that involve hybridization, such as Treemix.

What do you think?

ericopolo commented 3 years ago

It is necessary to add that, in my case, I am using an uninformative prior, with shape = 1. I think this would be mandatory (i.e. shape <= 1) if the suggested strategy is to be used, since we need zero to have a relatively high probability in the prior.

igronau commented 3 years ago

This seems like a reasonable statistical test. Let me know how it goes, and I may use in future analyses.