Closed andrewkern closed 2 years ago
LGTM. I'd make hi
exclusive just to keep consistent with things like range
, but otherwise looks very sensible.
So, to be concrete, the plan is to make alleles equal to [str(j) for j in range(lo, hi)]
and then if you have allele j, you can only ever mutate to j - 1
or j + 1
?
So, to be concrete, the plan is to make alleles equal to
[str(j) for j in range(lo, hi)]
and then if you have allele j, you can only ever mutate toj - 1
orj + 1
?
yep that's the SMM model in a nutshell
This sounds fine, although we should make sure to tell people not to set lo
and hi
to the actual min and max observed in their data. Alternatively, we could modify the infinite alleles or SLiM model code (not use the mutation matrix model), so that we can set hi=Inf
(or, something large). It would be straightforward.
Alternatively, we could modify the infinite alleles or SLiM model code (not use the mutation matrix model), so that we can set hi=Inf (or, something large). It would be straightforward.
That's not quite the same thing though is it? With infinite alleles you can only every go from j to j + 1, whereas the model above you can go back to j - 1 with equal proba.
That's not quite the same thing though is it? With infinite alleles you can only every go from j to j + 1, whereas the model above you can go back to j - 1 with equal proba.
It's totally not the same thing - infinite alleles maintains a global counter for the next allele, and this would look at the previous state to choose a new one, but it provides a template. (so, maybe the SLiM mutation model code is better, as it actually modifies the previous allele)
Right, gotcha
This sounds fine, although we should make sure to tell people not to set
lo
andhi
to the actual min and max observed in their data. Alternatively, we could modify the infinite alleles or SLiM model code (not use the mutation matrix model), so that we can sethi=Inf
(or, something large). It would be straightforward.
another issue is the ancestral distribution. I think uniformly distributed is probably wrong?
The matrix method @andrewkern outlined above would work, though, right? Probably a good place to start if someone is doing this rather than diving into the C code immediately.
Yeah i think it should work just fine.
Unless the number of alleles is 10_000 or something it may be good enough.
Good enough, but I don't think anyone's doing it right now, and it'd be nice to not clutter up this list of models with out of date ones? Re: the root distribution - it'd be natural to allow a slight left/right bias and use the stationary distribution (I think there's a Slatkin paper on this?).
Good enough, but I don't think anyone's doing it right now, and it'd be nice to not clutter up this list of models with out of date ones?
not sure what you mean? SMM is the most popular microsat model in analyses and simulations AFAIK
I mean if we implement another slightly different but better SMM later. But I suppose that allowing hi=Inf
is backwards-compatible, as is adding left/right jump probs. So, never mind.
i'm hoping to add some simple mutation models that would be appropriate for microsatellite data. i think this will be quite useful to folks who are working on non-model organisms, ecological genetics, and maybe even forensics?
The simplest, most classical model is the Stepwise Mutation Model of Kimura and Ohta (1978). I reckon that's a good place to start, although this nice paper by Calabrese and Durrett lays out a number of other models that would be simple enough to implement.
Here is how I would imagine the implementation to look for the stepwise microsat model. generally these models have a lo and hi parameter which governs the bounds on the number of repeat units of the microsatelllite
what do others think?