popgenmethods / momi2

Infer demographic history with the Moran model
GNU General Public License v3.0
47 stars 11 forks source link

momi2 exception, growth parameters #29

Closed plibrado closed 4 years ago

plibrado commented 4 years ago

Dear John,

I have been using momi2 for a while, and it is really a great method. However, I am having problems with the growth parameter, trying to model a strong exponential Ne decay in modern horse breeds (due to intense inbreeding). When the growth parameter is allowed to be either very high or very low (e.g. from -1e-2 to 1e-2), momi2 very often raises exceptions after a few "TNC" iterations:

I have lower constraints for all parameters, for tau>=50 years and Ne>=50. I suspect the real reason is "'Autograd ArrayBox with value nan" (https://github.com/HIPS/autograd/issues/93)

Is it really possible to somehow fix or avoid these issues? The same model is fine if growth parameters are excluded from the model.

Thanks in advance for your response,

Pablo

jackkamm commented 4 years ago

Hi @plibrado -- sorry for the delayed response, I missed this message.

In my experience, exponential growing population sizes can indeed cause numerical instability. Part of the reason is that they can cause really massive or tiny population sizes, if the growth rate is high enough or goes on for a long enough time. In addition, small changes in the growth rate can cause large changes in the population size.

A way to reduce this issue is to parametrize the model differently. For example, instead of parametrizing an epoch by the parameters N_0, t and g (for the base size, epoch length, and growth rate, respectively), instead parametrize it by N_0, t, and N_t (the base size, the epoch length, and the size at the top of the epoch).

jackkamm commented 4 years ago

I've added an example on how to do the alternative, more numerically stable parametrization of exponential growth parameters here: https://momi2.readthedocs.io/en/latest/tutorial.html#Using-functions-of-model-parameters-as-demographic-parameters

jackkamm commented 4 years ago

I'm going to close this for now but please reopen if you are having issues with the alternative parametrization I suggested.

silvewheat commented 3 years ago

Hi jackkamm,

I have seen your example, which is:

model.set_size(pop, N="N_end", g=lambda params: np.log(params.N_end/params.N_start)/params.t_start)
model.set_size(pop, t="t_start", g=0)

But when I run this code, there is an error occured:

TypeError: set_size() missing 1 required positional argument: 't'

And I also have a question about the 'end' and 'start'. Is "start" time is larger than 'end', just like the following picture? The grow rate should be np.log(params.N_end/params.N_start)/(params.t_start-params.t_end). But since the t_end is now, which is 0, thus, It is equal to np.log(params.N_end/params.N_start)/params.t_start. Am I right?

image

So, In the following model. I should set the growth rate of the ancectral population of pop1 and pop2 (the red line) like this? np.log(params.N_end/params.N_start)/(params.t_anc-params.t_pop1_pop2) image

jackkamm commented 3 years ago

Hi @silvewheat , sorry for the delay.

You're right, there's an error in the example -- set_size() requires specifying the time parameter. I've fixed it in the tutorial as follows:

model.add_leaf(pop, N="N_end", g=lambda params: np.log(params.N_end/params.N_start)/params.t_start)
model.set_size(pop, t="t_start", g=0)

Regarding your second question, the start time is the more ancient time, whereas the end time is the more recent time. So yes, your understanding and pictures are correct.

nerdbrained commented 3 years ago

I was just trying to set up a model for inference based on the example code above, using the "g=lambda params: np.log(params.N_end/params.N_start)/params.t_start" phrasing. I was getting an error in the optimization step (AttributeError: 'ArrayBox' object has no attribute 'log').

I am not completely sure why - I had imported numpy as np and was using "np.log" in the model specification but for some reason there was a problem interpreting np.log correctly (maybe something has been updated recently?).

After a bit of searching I turned up an alternative means of doing this 1) run "from autograd.numpy import log" 2) can now use just log instead of np.log: "g=lambda params: log(params.N_end/params.N_start)/params.t_start"

Now optimization works fine! Not sure why this I was getting the original error but thought this could be helpful if others run into the same problem.

jackkamm commented 3 years ago

The automatic differentiation only works with functions from autograd.numpy, not from vanilla numpy. So when defining custom functions, it's important to use autograd.numpy instead of numpy inside them.