Closed bhaller closed 1 year ago
I think it's important to specify JC here due to the feature of multiple mutations in the same site, which may cause back mutations and multi-allelic sites. To do this, the simulation engine has to consider nucleotides, but it doesn't mean that it has to accurately capture the sequence. According to my understanding, mutations are still modeled the same way, but if multiple mutations are mapped to the same discrete genomic position the simulaiton engine has to determine the resulting alleles (which can now also be 2,3, and not just 0,1). I think that this is what's done in msprime
(@jeromekelleher can confirm or correct), but I'm not sure if the same applies to SLiM
.
That's correct @igronau - we use discrete sites mutations by default now in msprime.
So maybe the claim of "Jukes-Cantor" mutations is correct for msprime, but not for SLiM? Who can be tagged here who would know what stdpopsim does in the SLiM case?
@petrelharp is the best authority there I think.
Hm, let's see. For msprime we use Jukes-Cantor (note that's the default). And, for SLiM we do still have stacked mutations (as we ran into Difficulties switching), but then we do generate_nucleotides and convert_nucleotides to produce ancestral & derived states, which generates from Jukes-Cantor (gee, I should say this in the docs).
So, yes, I think we are doing Jukes-Cantor all around.
Hm, let's see. For msprime we use Jukes-Cantor (note that's the default). And, for SLiM we do still have stacked mutations (as we ran into Difficulties switching), but then we do generate_nucleotides and convert_nucleotides to produce ancestral & derived states, which generates from Jukes-Cantor (gee, I should say this in the docs).
So, yes, I think we are doing Jukes-Cantor all around.
Ah, it's a post-conversion! :-O OK, good; this issue can be closed, then.
The manuscript says that "At present,
stdpopsim
simulates mutations at a constant rate under the Jukes-Cantor model of nucleotide mutations". This surprised me, since as I understand itstdpopsim
does not model explicit nucleotides, and the Jukes-Cantor model is explicitly a nucleotide-based mutational model as I understand it. It seems like perhaps it would be more accurate to simply say thatstdpopsim
simulates mutations occurring at a given rate, and leave it at that. But perhaps I'm misunderstanding whatstdpopsim
now does under the hood; and perhaps it is different for msprime versus SLiM simulations. Certainly in SLiM one can only use the Jukes-Cantor model, in the sense of using themmJukesCantor()
function, if one is explicitly simulating nucleotides. Thoughts? @igronau @petrelharp