tskit-dev / msprime

Simulate genealogical trees and genomic sequence data using population genetic models
GNU General Public License v3.0
177 stars 88 forks source link

Integration of Selfing and Dormancy #1653

Open TPPSellinger opened 3 years ago

TPPSellinger commented 3 years ago

I have all permission to share our scripts to simulate genome with variation (piecewise) of selfing and dormancy. @jeromekelleher, let me know how to proceed. Since it's all rescaling we build python functions creating msprime "command lines" (i.e. parameters and models) . Stefan Strütt from the max planck institute of Köln did simulations to compare msprime to slim, demonstrating both approach are equivalent (paper should be coming soon). In my lab we are building a forward simulator, to do the same for dormancy.

jeromekelleher commented 3 years ago

That's excellent, thanks for sharing @TPPSellinger! I guess the simplest thing would be to wait for the paper, and then we can think about how we might merge the functionality into msprime?

Or, if you'd like to get things in more quickly, perhaps someone could summarise the approach here and the proposed changes?

TPPSellinger commented 3 years ago

I will look at it deeply and try to get a PR ready for when the paper is submitted/accepted. I hope this is okay for you, I don't really know how you would like this to be integrated into msprime. I had something in mind like for variation of population size (default being no selfing and no dormancy)

petrelharp commented 3 years ago

We could help suggest how this would be implemented if you give even a brief summary of how this works? I'm imagining that selfing and dormancy act as rescalings on recombination rate, effective population size, and generation time? In particular: is it rescaling only the trees (sim_ancestry) or also mutation rate? When you say "variation" do you mean that these are changing in time, like 1000 years of 90% selfing followed by 1000 years of 70% selfing?

TPPSellinger commented 3 years ago

Hi @petrelharp. My mistake for not giving more insight. It's exactly what you said, we change these rates so that they are piece-wise constant in time. Selfing will rescale Ne and recombination rate. So we change recombination rate and Ne through time to mimic the effect of selfing. For dormancy, it's slighlty more tricky. Depending from the hypothesis (or model), it will in addition scale the mutation rate. Here also, it's piecewise constant, and at generation x we change the scaling according to the new rates.

I hope this helped a little bit. For the moment everything is piecewise constant, but I can ask Stefan Strüt (he doesn't use github but I will ) if he has the formulas for smooth transitions. Let me know if you want me to put all the rescaling formulas I have here. I can also put out the python script we use (it's not very long and quite straight-forward), it will probably help make things clearer.

petrelharp commented 3 years ago

Thanks! We don't need to formulas now; I'm just trying to think about the API. It's tricky because it affects both ancestry and mutations. Here's some options, brainstorming:

None of these options seem very attractive or easy. Any suggestions?

David-Peede commented 1 year ago

Hey! Just dropping a line because I am very interested in using this functionality! Is there any update or anything I can help with?

petrelharp commented 1 year ago

I happen to have communicated with @TPPSellinger recently on another issue; he's moved to industry so I think there's no update. So - it's open and available to help with, if you're interested!

TPPSellinger commented 1 year ago

Hi ! Yes unfortunately I have moved to industry. But I try to finish everything that I started during my postdocs. We recently published a paper using msprime to simulate selfing (https://elifesciences.org/articles/82384). The scripts should be available somewhere in the paper. They should also be on my github. I hope this can help for the momen and make the integration of selfing into msprime easier in the future.

David-Peede commented 1 year ago

Amazing! Thank you so much! I am currently wrapping up a manuscript that should hopefully be submitted by the end of the month, but in the mean time I will check out the code!