nicfel / Mascot

BEAST2 implementation of the Masco approximation of the structured coalescent with forwards/backwards algorithm for node state calculation
GNU General Public License v3.0
9 stars 5 forks source link

Potential math error in exponential coalescent #12

Closed m-a-martin closed 6 months ago

m-a-martin commented 7 months ago

Hi Dr. Müller,

Really sorry if I'm mistaken here, but I think there might be an error in the math in the getNeTime function in mascot.parameterdynamics.ExponentialNe, although it's possible I'm misunderstanding how the parameters are defined.

If NeNull is the population size at the most recent calendar time and getNeTime is supposed to return the population size at time t in the past then I think it should be NeNull Math.exp(-tgrowthRate).

E.g. with a growth rate of 0.25 and a NeNull of 100 you would expect the population size 10 years in the past to be 100exp(-0.25 10) = 8.2.

This is how (what I presume is) the analogous function in the regular exponential coalescent function is written.

Again, if I'm misunderstanding I apologize, but I can't wrap my head around the function as written.

https://github.com/nicfel/Mascot/blob/43100a5df3dae2a9acc8dea8c005c81cd84ec770/src/mascot/parameterdynamics/ExponentialNe.java#L34

m-a-martin commented 6 months ago

Hi Dr. Muller,

I wanted to follow up about this question. To test my intuition, I fit both the Mascot exponential coalescent and the standard exponential coalescent models to the pandemic H1N1 dataset from the beast.community tutorial .

The standard exponential coalescent ends up with parameter estimates very similar to the tutorial, with a mean pop size of ~16 and a growth rate of ~21.5.

Screenshot 2024-05-02 at 9 08 41 PM

Using the MASCOT skyline coalescent with exponential dynamics, however, the pop size parameter is estimated to be much lower, ~2.2.

Screenshot 2024-05-02 at 9 08 34 PM

It's possible I am misunderstanding how the model is defined, but at least based on the code comments and the MASCOT skyline manuscript the population size parameters for both models is the population size at the most recent sampling time, so they should be analogously defined. With only a single deme I would expect them to give similar results. I've attached the XML file used for the MASCOT analysis.

H1N1pdm_2009_mascot.zip

Edit: After looking at this more and looking through the source code for other parametric Ne models built into Mascot, I've realized that my confusion stems from the fact that NeNull is actually Log(Ne). Apologies for not picking up on this. This is clarified in the constant Ne model code, but not for the exponential. It may be worth clarifying in the code or documentation somewhere, IMO, as this is a departure from other BEAST models, at least in my experience. minNe is also not defined on the log scale, which I at least find to be confusing.

rbouckaert commented 6 months ago

@m-a-martin I noticed that the default prior on growth rate for the coalescent with exponential growth is Laplace(mu=0.001, scale=0.5) while for Mascot it is Laplace(mu=0.001, scale=100), which perhaps can have an impact as well. It is always quite tricky comparing similar looking models.

m-a-martin commented 6 months ago

@rbouckaert great catch, I meant to double check that. regardless, after realizing the parameter in MASCOT is log(Ne) I think everything is fine in the code.

nicfel commented 6 months ago

Hi @m-a-martin, Apologies for the late response.

I'll change the logging of the NeNull to logNeNull in the next release.

Best, Nicola

alexeid commented 6 months ago

Maybe the code should have explicit in the comments what the parameterisation is? Especially since it is different from the constant case!