Open grahamgower opened 4 years ago
I guess this could be done generally? Size zero populations shouldn't hurt msprime if there's no individuals in there (and could potentially be used to speed things up a bit).
Probably that makes sense when no samples are requested.
I guess the ancestral population in GenericIM() is not an issue for msprime? But how is this handled in the model="dtwf" case?
It's not an issue in that if there's no ancestral material present in a population, then nothing will happen there. So, we can set the pop sizes to zero and it won't make any difference. Should be fine for DTWF too --- if not, we can fix it.
Cool. Currently the SLiM engine inherits this from the demography debugger. So that's probably the place to fix it.
>>> model=stdpopsim.GenericIM(1000,100,100, 500, 0, 0)
>>> model.debug()
Model = hudson(reference_size=1)
=============================
Epoch: 0 -- 500.0 generations
=============================
start end growth_rate | 0 1 2
-------- -------- -------- | -------- -------- --------
0 | 100 100 0 | 0 0 0
1 | 100 100 0 | 0 0 0
2 | 1e+03 1e+03 0 | 0 0 0
Events @ generation 500.0
- Mass migration: lineages move from 0 to 2 with probability 1
- Mass migration: lineages move from 1 to 2 with probability 1
===============================
Epoch: 500.0 -- inf generations
===============================
start end growth_rate | 0 1 2
-------- -------- -------- | -------- -------- --------
0 | 100 100 0 | 0 0 0
1 | 100 100 0 | 0 0 0
2 | 1e+03 1e+03 0 | 0 0 0
can't the ancestral population remain in the SLIM simulation but just be given a new size at time T? that will keep it a two population simulation. i.e. one of the daughter populations (forward in time) is just the ancestral one with a new size
@andrewkern Should we dictate that this is how all models should be constructed? Both approaches are currently supported.
Also see comment by @gtsambos. https://github.com/popgensims/stdpopsim/pull/163#issuecomment-560057772
yeah @gtsambos brings up two good issues there-- 1) are the sims the same? 2) is the population labeling important enough to specify 3 popns vs 2. I'm confident that if everything is going okay issue one should not be a worry. but issue two is a bit stickier....
From a computational complexity stand point it makes sense to me to retain only two pops through these simulations rather than build a 3rd....
In the three population scenario, the third population should have zero size in the epoch "0 -- 500.0 generations". This could be manually added with an additional entry in population_configurations
, or the demography debugger could be modified to output zero in this case. Either of these choices would be sufficient to destroy the population in the SLiM simulation, with the current engine code (and contribute no computational burden).
My initial thoughts are that I am against going through and changing all the msprime implementations of the models just for their translation to the slim inputs. It will add a bit to the population_configurations
and demographic_events
that are only really needed to translate to slim, and it while it doesn't change the computational complexity, I can see it adding complexity to the human implementation and QC of demographic models, especially when those models are complex and involve many populations.
If the goal is having a flexible script that can translate any msprime input into slim input, I think this should be handled internally. It is straightforward to write a function that take the sampling times (and whether sampling is allowed) from each population and the demographic events to determine all times that each population has the possibility of carrying a lineage, and from that information automatically turn those populations on and off. I've been thinking about this a bit for translating demographic history graphs to msprime and forward simulator inputs, and could sketch out a function that does this if that's helpful.
Yes, agreed we don't want to do this by hand. Perhaps we can derive the information from the demography debugger though?
Yes, agreed we don't want to do this by hand. Perhaps we can derive the information from the demography debugger though?
Yes, I think so, as long as we also start with the sampling information (which populations have lineages to start with, or at what times we draw lineages from a population if it's an ancient sample).
Edit: Maybe I'm a bit confused though - is the translation to slim actually based on parsing the text output of the debugger output, instead of the reading through list of demographic events, pop configs, and migration matrix?
Edit: Maybe I'm a bit confused though - is the translation to slim actually based on parsing the text output of the debugger output, instead of the reading through list of demographic events, pop configs, and migration matrix?
No, we never want to parse that output! We can access the information through attributes though, it's all in there before it gets printed out.
i think this function gets us most of the way there, no? https://github.com/tskit-dev/msprime/blob/8a39c401d1c817f4950b7fc486da8238b9fa76ec/msprime/simulations.py#L1547
I was thinking more of a function that says whether or not a lineage could be found in a given population at a given time. That function doesn't take into account MassMigration events that move lineages around, right? Is there a function in msprime that does this already? I didn't see one scrolling through the DemographyDebugger
class.
No, I don't think there is. Shouldn't be terribly hard to do though, right?
This shouldn't be hard to do at all, no. I can put something together soon that does that taking advantage of methods within the DemographyDebugger
class.
Great @apragsdale! Do you think #258 could be dealt with similarly?
Thanks @apragsdale, this sounds great. Perhaps it would make sense to make it a method of DemographyDebugger?
Yes, I think that makes sense to make it a method of the DemographyDebugger. Then I suppose I should open a PR on the msprime repo for that? Sorry in advance things will be a bit slower for me this week - traveling and visiting some family and friends.
Yes, I think it makes sense to do it on the msprime side. There's no hurry with this, it's a more long-term feature --- things will work without it. Enjoy your trip @apragsdale!
Was an issue opened for this on the msprime repo? I don't see anything obvious there.
I'd like to close this issue if the fix is going to happen in an external repository.
No, I don't think we did. @apragsdale, would you mind following this up and creating an issue on msprime please?
Yes, really sorry, I had forgotten about this whole thing... I think I wrote up some function in the DemoraphyDebugger to return whether lineages can be found in each deme during epochs defined by times of events from the list of demographic_events
. I'll try to find that code and open a PR in the msprime repo now.
Ok, took a stab at this.. This feature of the DemographyDebugger is in this PR (https://github.com/tskit-dev/msprime/pull/969#issue-407583630).
Let me know what you think, or if it needs something different (or if there are mistakes).
Consider a GenericIM() model. The ancestral population with size NA should not exist in the SLiM simulation after time T. It currently does (without affecting N1 or N2), and this adds needless computational burden.
Another case is when no samples are requested from a population (or requested only at a specific time point), and there are no migrants from this population.