popsim-consortium / adding-species-manuscript

manuscript materials for the adding species paper
3 stars 14 forks source link

is "Jukes-Cantor model" correct? #87

Closed bhaller closed 1 year ago

bhaller commented 1 year ago

The manuscript says that "At present, stdpopsim simulates mutations at a constant rate under the Jukes-Cantor model of nucleotide mutations". This surprised me, since as I understand it stdpopsim does not model explicit nucleotides, and the Jukes-Cantor model is explicitly a nucleotide-based mutational model as I understand it. It seems like perhaps it would be more accurate to simply say that stdpopsim simulates mutations occurring at a given rate, and leave it at that. But perhaps I'm misunderstanding what stdpopsim now does under the hood; and perhaps it is different for msprime versus SLiM simulations. Certainly in SLiM one can only use the Jukes-Cantor model, in the sense of using the mmJukesCantor() function, if one is explicitly simulating nucleotides. Thoughts? @igronau @petrelharp

igronau commented 1 year ago

I think it's important to specify JC here due to the feature of multiple mutations in the same site, which may cause back mutations and multi-allelic sites. To do this, the simulation engine has to consider nucleotides, but it doesn't mean that it has to accurately capture the sequence. According to my understanding, mutations are still modeled the same way, but if multiple mutations are mapped to the same discrete genomic position the simulaiton engine has to determine the resulting alleles (which can now also be 2,3, and not just 0,1). I think that this is what's done in msprime (@jeromekelleher can confirm or correct), but I'm not sure if the same applies to SLiM.

jeromekelleher commented 1 year ago

That's correct @igronau - we use discrete sites mutations by default now in msprime.

bhaller commented 1 year ago

So maybe the claim of "Jukes-Cantor" mutations is correct for msprime, but not for SLiM? Who can be tagged here who would know what stdpopsim does in the SLiM case?

jeromekelleher commented 1 year ago

@petrelharp is the best authority there I think.

petrelharp commented 1 year ago

Hm, let's see. For msprime we use Jukes-Cantor (note that's the default). And, for SLiM we do still have stacked mutations (as we ran into Difficulties switching), but then we do generate_nucleotides and convert_nucleotides to produce ancestral & derived states, which generates from Jukes-Cantor (gee, I should say this in the docs).

So, yes, I think we are doing Jukes-Cantor all around.

bhaller commented 1 year ago

Hm, let's see. For msprime we use Jukes-Cantor (note that's the default). And, for SLiM we do still have stacked mutations (as we ran into Difficulties switching), but then we do generate_nucleotides and convert_nucleotides to produce ancestral & derived states, which generates from Jukes-Cantor (gee, I should say this in the docs).

So, yes, I think we are doing Jukes-Cantor all around.

Ah, it's a post-conversion! :-O OK, good; this issue can be closed, then.