tskit-dev / msprime

Simulate genealogical trees and genomic sequence data using population genetic models
GNU General Public License v3.0
172 stars 84 forks source link

feature request: selfing #789

Open hannesbecher opened 5 years ago

hannesbecher commented 5 years ago

Hi, I was wondering whether selfing might be implemented at some point? When importing a tree sequence generated in SLiM, each (sub)population has a selfing rate in its meta data, but I guess this is ignored when "finishing" the tree sequence backwards-in-time? Cheers, Hannes

jeromekelleher commented 5 years ago

In general, msprime doesn't know anything about the tree sequences that are provided as input for 'recapitation', and doesn't look at the metadata in any way. We just use the parameters that are provided as input, find the uncoalesced segments and move on as usual. Pyslim does know about SLiM metadata though, so it may be able to do something appropriate there.

For selfing, I don't know whether this is something that can be modelled in the coalescent. Do you have some references for where this is done, or are there other simulators that do this? If it's something that can be done in the coalescent, then ultimately we'd like to see it in msprime. We'd need someone to take the lead in implementing it, but I'd be happy to assist.

hannesbecher commented 5 years ago

Yes, selfing can be modelled under the coalescent. Nordborg and others have published on the coalescent with selfing: https://www.genetics.org/content/146/3/1185.short Möhle as well. There is something about it in John Wakeley's 2008 book. It looks like EggLib can simulate selfing: http://mycor.nancy.inra.fr/egglib/py/coalesce.html (don't ask me how this works, though).

jeromekelleher commented 5 years ago

Cool - well, I'm happy to talk about how to go about implementing this, but someone other than me would need to lead the work, I'm afraid.

hannesbecher commented 5 years ago

I could not do it. Is it time to add a "help wanted" label?

jeromekelleher commented 5 years ago

Done - hopefully there'll be someone motivated enough to take this on soon!

TPPSellinger commented 4 years ago

Is this issue still open ? @hannesbecher is my modified version of scrm (https://github.com/TPPSellinger/escrm) good enough for you (based on the papers of Nordborg and Möhle) ? You can define different population with different selfing rate, however, selfing rate is constant in time and selfing has always been there. In addition migration is assumed to be independent from selfing, which might not be true from a biological point of view... I will (I hope) implement changing selfing rate in time once I finished my Ph.D (so in 2021). It will probably be based on scrm, but if possible I will try to implement it in msprime.

hannesbecher commented 4 years ago

Sounds interesting, thank you @TPPSellinger ! I was hoping this would be implemented in msprime, though!

TPPSellinger commented 4 years ago

Using the formulas in : https://doi.org/10.1371/journal.pgen.1008698 which are based on the Nordborg paper you could theoretically tune the input parameter of msprime to simulate selfing. I mean, in the Nordborg paper, selfing simply rescales the coalescence and recombination rate. However this would be true, as in Island model of Nordborg, only if selfing is constant, has always been there and individuals are diploid.