Open molpopgen opened 3 years ago
One thing I forgot: allowing start == end
along with some concept of "forever", allowing the strong balancing selection scenario of Kaplan, Darden, and Hudson?
this is great @molpopgen! i'm totally game to work on implementing these things.
while dealing with dominance is important eventually, i think the first thing to knock out is the selection from standing variation scenario to deal with the issue in #1762
all this should amount to is implementing drift/neutral trajectories which I have already done. I just need to finish up the testing and get a PR together.
another big priority for me is to get the population size changes done and incorporating the rejection sampling of trajectories that i've done in discoal.
For naming-- I'd suggest we set up models like "hard_sweep" , "recurrent_hard_sweep", "soft_sweep", etc. How does that strike you?
Sounds great to me @molpopgen, thanks a million for putting this together.
Naming is tricky because it is related to what's under the hood, at least a bit:
msprime.HudsonStructuredCoalescentWithASingleTrajectory
. This is explicit, but there's a good chance of lots of kwargs
, all possible subsets of which cause confusion. But the plus side is a single point of failure when it comes to checking input.class msprime.HardSweep(msprime._HudsonStructuredBlahBlahBlah)
. That gets us into the super()
pickling mess and some other confusion.init
functions, and each class holds an instance of the richer _HudsonStructuredThing
.But all of the above is related to what the C would look like under all of this.
No matter what, we want the defaults to reflect common use cases, allowing you to do things like remake a figure from a 2002 paper in 1/2 screen of Python code.
No matter what, we want the defaults to reflect common use cases, allowing you to do things like remake a figure from a 2002 paper in 1/2 screen of Python code.
let's do it. want to review some code for me @molpopgen as I pull the bits together?
let's do it. want to review some code for me @molpopgen as I pull the bits together?
Yeah, definitely. I'm quite interested, because I had a near-complete diploSHIC workflow in place using this, until #1762 got in the way!
diploSHIC training is gonna fly once this is done.
diploSHIC training is gonna fly once this is done.
Yeah, in my little test work flows, it was about 30% faster. There's considerable file I/O + time spend in allel
that dominates still.
would be great to side step that all using tskit stats....
would be great to side step that all using tskit stats....
Yes, the challenge there is that many of the stats are haplotype-y, and many such stats have an ad-hoc flavor to them. The whole set of D-like
stats can be don using the existing code, but for many of the others, there's some thinking that needs doing.
This issue is motivated by #1762
The current class for modeling the structured coalescent is
msprime.SweepGenicSelection
. This class handles the following scenario:start
toend
.The
init
arguments that matter most are start/end frequencies ands
.Moving forwards, there are many more modeling possibilities:
s
and/or dominance vary over time. By default, we probably want this feature to be on, reflecting the most common anticipated use case.So, at the very least, we want the
init
method to account for:h
kwarg
We also need a name, bearing in mind the assumptions as I understand them:
I've likely missed a few things.
cc @andrewkern @jeromekelleher