Closed connor-french closed 1 year ago
Hi @connor-french :wave:
This is a very cool application, and certainly not something I had in mind when this stuff together! I think to make progress here we're going to need a more manageable example to debug. I see from your notebook that you haven't been able to provoke by creating a standard model.
Perhaps working with a subset of your raster would work? If we can get the error to happen on an example where we can call debug()
and get some output from the demography I think we can zero in on the issue fairly quickly.
This sort of thing is much more likely to happen if the reverse-time migration rates haven't been reweighted by population sizes; it looks like you haven't done that yet? IIRC, this means that the msprime migration rate from i
to j
should be m * n[j] / n[i]
, where n
is the vector of population sizes. If migration doesn't see the population sizes then lineages are much more likely to stay in a location where the population size is going to zero; correctly weighted the lineages will leave such locations.
Do you think that might be the issue?
Thanks @jeromekelleher and @petrelharp! I found a solution, but I'm not 100% sure why it solved the problem.
I was ceiling rounding the deme sizes to the nearest whole number before assigning the values to population initial_size
s in the Demography object, but I did not convert them to integers after rounding. Running an astype(int)
method on the numpy array before parameterizing the Demography model does the trick! I figured this out after trying many subsets of the raster data like Jerome suggested, not being able to isolate the problem, and deciding to visualize the population size matrices in CSV files. I converted the matrices to integers to prevent the large trail of zeros in the CSV file, didn't see anything weird, then tried running the model with integer arrays. No problems after repeated runs!
@petrelharp, I ran into this problem in the midst of trying to rescale the migration rates! Before trying the rescaling, I updated my sampling scheme to sample two individuals from every deme that has at least two individuals, so I could get an idea of how the migration rate scaling impacts the genetic diversity landscape. I ran into this problem with the updated sampling scheme, so rescaling is next on the docket.
Wow, that's weird. No ideas here.
I don't know either - I think it's most likely to be some quirk of the input data and probably not a bug in msprime, so going to close the issue. Feel free to reopen if you disagree though @connor-french!
Howdy y'all,
Unfortunately I need to revisit this issue. What I thought was a weird fix turned out not to work- I just had a lucky streak with no errors popping up. In working on https://github.com/tskit-dev/msprime/discussions/2196#discussion-5527465, I ran into the same error again.
I approached the problem systematically and I think the error is due to populations going extinct and recolonized. When I run the simulation with deme sizes that can go extinct I get the error, but when I set the minimum population size to be 1, the error goes away. Maybe there is a problem with the sampled lineages going extinct before coalescent events happen, or something similar?
Based on my explorations when including populations that go fully extinct, it seems like the probability of failure increases with the number of demes that are simulated- simulations with the full number of demes (117 x 110) fail more frequently than those run with fewer demes, e.g. 9/18 simulations failed with the full number of demes, while 3/200 simulations failed with a 10 x 10 matrix. The deme sizes range from zero to 100.
I was able to create a simple reproducible example that runs fast, ~4 sec on my laptop (Jupyter Notebook and data).
I'm interested in hearing your thoughts!
Okay, let's see - I don't think there's any sign of this being an actual bug in msprime - it's entirely plausible this is just due to a weird demographic situation. However, this is fine place to discuss how to debug that (although maybe this should be a Discussion instead?).
The error message says "very small population sizes"; that seems pretty likely to me, that somehow lineages are ending up in places where population sizes are going to zero. Then the questions are (a) how to identify where/when that happens, to help figuring out how to (b) set up your model so that doesn't happen.
One possible way to debug this is by using the DemographyDebugger? For instance, if you use the lineage_probabilities
method, then you can pretty easily look for times & places where lineages are likely to be but population sizes are low?
In terms of setting it up so this doesn't happen, the first place I'd tend to look is how the model gets set up in the first place - if population sizes are set from a mechanistic, forwards-time demographic model, and the forwards-to-reverse time migration rates are done correctly, then lineages shouldn't be hitting any nonsensical population sizes. But I could imagine lots of ways that could go wrong.
Is this helpful?
Thanks Peter, this was very helpful! I agree that I may be doing something wonky with my migration rate rescaling… I’ll dive into the DemographyDebugger and interrogate where things may be going wrong. I forgot about using the debugger as a tool once I got the problem to a manageable size! I’ll move any further problems/correspondence to the Discussions since this isn’t an msprime issue and more of a model setup problem.
Best, Connor
On Fri, Sep 1, 2023 at 5:16 PM Peter Ralph @.***> wrote:
Okay, let's see - I don't think there's any sign of this being an actual bug in msprime - it's entirely plausible this is just due to a weird demographic situation. However, this is fine place to discuss how to debug that (although maybe this should be a Discussion instead?).
The error message says "very small population sizes"; that seems pretty likely to me, that somehow lineages are ending up in places where population sizes are going to zero. Then the questions are (a) how to identify where/when that happens, to help figuring out how to (b) set up your model so that doesn't happen.
One possible way to debug this is by using the DemographyDebugger? For instance, if you use the lineage_probabilities https://tskit.dev/msprime/docs/latest/api.html#msprime.DemographyDebugger.lineage_probabilities method, then you can pretty easily look for times & places where lineages are likely to be but population sizes are low?
In terms of setting it up so this doesn't happen, the first place I'd tend to look is how the model gets set up in the first place - if population sizes are set from a mechanistic, forwards-time demographic model, and the forwards-to-reverse time migration rates are done correctly, then lineages shouldn't be hitting any nonsensical population sizes. But I could imagine lots of ways that could go wrong.
Is this helpful?
— Reply to this email directly, view it on GitHub https://github.com/tskit-dev/msprime/issues/2195#issuecomment-1703327512, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI7LCFO32DPNOKTKA6VW4J3XYJGBXANCNFSM6AAAAAA3L35LKI . You are receiving this because you were mentioned.Message ID: @.***>
Hi all,
I am running large metapopulation (2D stepping stone, ~100 x 100 deme matrices) models in msprime and have run into a stubborn error:
LibraryError: The simulation model supplied resulted in a parent node having a time value <= to its child. This can occur either as a result of multiple bottlenecks happening at the same time or because of numerical imprecision with very small population sizes.
.I think the error is due to numerical imprecision with very small population sizes (some populations go extinct often or remain at low or zero population size the entire simulation), but I've failed to create a minimal reproducible example that duplicates the error. My attempt to create a minimal reproducible example suggests that small population sizes may not be the problem because it runs fine with population sizes < 5 and I also don't think I'm accidentally assigning multiple bottlenecks happening at the same time. So, I'm not sure what is causing the error.
I've provided links to download a Jupyter Notebook (
github_issue.ipynb
) and a data file (projections_ihe_10km.tif
) with relevant code for reproducing the error and an additional example exploring the potential cause with simulated data. I used the minimal number of steps and minimal amount of code necessary to reproduce the error, but I apologize because it is still a fair bit of code.Thanks for any insight you may have on the problem!
Best, Connor