popsim-consortium / stdpopsim

A library of standard population genetic models
GNU General Public License v3.0
122 stars 86 forks source link

[slim-engine-dev] Set population size to zero if there is no influence on the rest of the simulation. #260

Open grahamgower opened 4 years ago

grahamgower commented 4 years ago

Consider a GenericIM() model. The ancestral population with size NA should not exist in the SLiM simulation after time T. It currently does (without affecting N1 or N2), and this adds needless computational burden.

Another case is when no samples are requested from a population (or requested only at a specific time point), and there are no migrants from this population.

jeromekelleher commented 4 years ago

I guess this could be done generally? Size zero populations shouldn't hurt msprime if there's no individuals in there (and could potentially be used to speed things up a bit).

grahamgower commented 4 years ago

Probably that makes sense when no samples are requested.

I guess the ancestral population in GenericIM() is not an issue for msprime? But how is this handled in the model="dtwf" case?

jeromekelleher commented 4 years ago

It's not an issue in that if there's no ancestral material present in a population, then nothing will happen there. So, we can set the pop sizes to zero and it won't make any difference. Should be fine for DTWF too --- if not, we can fix it.

grahamgower commented 4 years ago

Cool. Currently the SLiM engine inherits this from the demography debugger. So that's probably the place to fix it.

>>> model=stdpopsim.GenericIM(1000,100,100, 500, 0, 0)            
>>> model.debug()
Model =  hudson(reference_size=1)
=============================
Epoch: 0 -- 500.0 generations
=============================
     start     end      growth_rate |     0        1        2    
   -------- --------       -------- | -------- -------- -------- 
0 |   100      100                0 |     0        0        0    
1 |   100      100                0 |     0        0        0    
2 |  1e+03    1e+03               0 |     0        0        0    

Events @ generation 500.0
   - Mass migration: lineages move from 0 to 2 with probability 1
   - Mass migration: lineages move from 1 to 2 with probability 1
===============================
Epoch: 500.0 -- inf generations
===============================
     start     end      growth_rate |     0        1        2    
   -------- --------       -------- | -------- -------- -------- 
0 |   100      100                0 |     0        0        0    
1 |   100      100                0 |     0        0        0    
2 |  1e+03    1e+03               0 |     0        0        0
andrewkern commented 4 years ago

can't the ancestral population remain in the SLIM simulation but just be given a new size at time T? that will keep it a two population simulation. i.e. one of the daughter populations (forward in time) is just the ancestral one with a new size

grahamgower commented 4 years ago

@andrewkern Should we dictate that this is how all models should be constructed? Both approaches are currently supported.

Also see comment by @gtsambos. https://github.com/popgensims/stdpopsim/pull/163#issuecomment-560057772

andrewkern commented 4 years ago

yeah @gtsambos brings up two good issues there-- 1) are the sims the same? 2) is the population labeling important enough to specify 3 popns vs 2. I'm confident that if everything is going okay issue one should not be a worry. but issue two is a bit stickier....

From a computational complexity stand point it makes sense to me to retain only two pops through these simulations rather than build a 3rd....

grahamgower commented 4 years ago

In the three population scenario, the third population should have zero size in the epoch "0 -- 500.0 generations". This could be manually added with an additional entry in population_configurations, or the demography debugger could be modified to output zero in this case. Either of these choices would be sufficient to destroy the population in the SLiM simulation, with the current engine code (and contribute no computational burden).

apragsdale commented 4 years ago

My initial thoughts are that I am against going through and changing all the msprime implementations of the models just for their translation to the slim inputs. It will add a bit to the population_configurations and demographic_events that are only really needed to translate to slim, and it while it doesn't change the computational complexity, I can see it adding complexity to the human implementation and QC of demographic models, especially when those models are complex and involve many populations.

If the goal is having a flexible script that can translate any msprime input into slim input, I think this should be handled internally. It is straightforward to write a function that take the sampling times (and whether sampling is allowed) from each population and the demographic events to determine all times that each population has the possibility of carrying a lineage, and from that information automatically turn those populations on and off. I've been thinking about this a bit for translating demographic history graphs to msprime and forward simulator inputs, and could sketch out a function that does this if that's helpful.

jeromekelleher commented 4 years ago

Yes, agreed we don't want to do this by hand. Perhaps we can derive the information from the demography debugger though?

apragsdale commented 4 years ago

Yes, agreed we don't want to do this by hand. Perhaps we can derive the information from the demography debugger though?

Yes, I think so, as long as we also start with the sampling information (which populations have lineages to start with, or at what times we draw lineages from a population if it's an ancient sample).

Edit: Maybe I'm a bit confused though - is the translation to slim actually based on parsing the text output of the debugger output, instead of the reading through list of demographic events, pop configs, and migration matrix?

jeromekelleher commented 4 years ago

Edit: Maybe I'm a bit confused though - is the translation to slim actually based on parsing the text output of the debugger output, instead of the reading through list of demographic events, pop configs, and migration matrix?

No, we never want to parse that output! We can access the information through attributes though, it's all in there before it gets printed out.

andrewkern commented 4 years ago

i think this function gets us most of the way there, no? https://github.com/tskit-dev/msprime/blob/8a39c401d1c817f4950b7fc486da8238b9fa76ec/msprime/simulations.py#L1547

apragsdale commented 4 years ago

I was thinking more of a function that says whether or not a lineage could be found in a given population at a given time. That function doesn't take into account MassMigration events that move lineages around, right? Is there a function in msprime that does this already? I didn't see one scrolling through the DemographyDebugger class.

jeromekelleher commented 4 years ago

No, I don't think there is. Shouldn't be terribly hard to do though, right?

apragsdale commented 4 years ago

This shouldn't be hard to do at all, no. I can put something together soon that does that taking advantage of methods within the DemographyDebugger class.

grahamgower commented 4 years ago

Great @apragsdale! Do you think #258 could be dealt with similarly?

jeromekelleher commented 4 years ago

Thanks @apragsdale, this sounds great. Perhaps it would make sense to make it a method of DemographyDebugger?

apragsdale commented 4 years ago

Yes, I think that makes sense to make it a method of the DemographyDebugger. Then I suppose I should open a PR on the msprime repo for that? Sorry in advance things will be a bit slower for me this week - traveling and visiting some family and friends.

jeromekelleher commented 4 years ago

Yes, I think it makes sense to do it on the msprime side. There's no hurry with this, it's a more long-term feature --- things will work without it. Enjoy your trip @apragsdale!

grahamgower commented 4 years ago

Was an issue opened for this on the msprime repo? I don't see anything obvious there.

I'd like to close this issue if the fix is going to happen in an external repository.

jeromekelleher commented 4 years ago

No, I don't think we did. @apragsdale, would you mind following this up and creating an issue on msprime please?

apragsdale commented 4 years ago

Yes, really sorry, I had forgotten about this whole thing... I think I wrote up some function in the DemoraphyDebugger to return whether lineages can be found in each deme during epochs defined by times of events from the list of demographic_events. I'll try to find that code and open a PR in the msprime repo now.

apragsdale commented 4 years ago

Ok, took a stab at this.. This feature of the DemographyDebugger is in this PR (https://github.com/tskit-dev/msprime/pull/969#issue-407583630).

Let me know what you think, or if it needs something different (or if there are mistakes).