Closed barytonos closed 2 years ago
Hmm, turns out that the problem is the mutation rate. If the mutation rate in msprime.simulate
is set to 0 (as in the original code in the SLiM manual), apparently the code flow never enters the problematic function.
Thanks for the excellent bug report @barytonos!
I've transferred this to the pyslim repo as the tskit function is behaving as expected - it requires bytes inputs, and pyslim needs to do the encoding.
Oh, dear, this is indeed a bug. And, it looks like pyslim's tests didn't catch it because the tests didn't have mutations turned on :facepalm:.
I'll not be able to get to this right away, although pyslim is going to be the top of the priority list in about two weeks. However, there's actually a better way to do this now, since msprime can generate SLiM mutations directly:
import msprime, pyslim
ts = msprime.sim_ancestry(samples=100, population_size=100, sequence_length=1e8, recombination_rate=1e-8)
slim_ts = pyslim.annotate_defaults(ts, model_type="WF", slim_generation=1)
slim_ts = msprime.sim_mutations(slim_ts,
rate=1e-7,
model=msprime.SLiMMutationModel(type=0)
)
(note that I've made the number of samples and Ne smaller so this runs faster). We should update the SLiM manual accordingly, I think. (@bhaller?)
(edited to mutate slim_ts
instead of ts
, which was an error)
Filed https://github.com/MesserLab/SLiM/issues/242. It's not clear to me that there's anything about recipe 17.8/17.9 that needs revision; the intent, in those recipes, is not to generate mutations, just a coalescent history. But we should certainly have an example of using SLiMMutationModel
in the manual; so maybe it ought to be a new recipe 17.11. Let's discuss that over in the new issue. :->
Regarding the issue here, I'm a bit puzzled. If you generate mutations without using SLiMMutationModel
, as the code at the start does, the mutation positions will not be integers and the tree sequence will not be SLiM-compliant, right? So that code ought to just raise an error in pyslim, like "Non-integer mutation positions are not legal in a SLiM tree sequence; please use SLiMMutationModel
", no?
It's not clear to me that there's anything about recipe 17.8/17.9 that needs revision
We ought to move the msprime code in the SLiM manual over to using the new msprime.sim_ancestry( )
function anyhow. msprime.simulate( )
is maintained, but deprecated.
If you generate mutations without using SLiMMutationModel, as the code at the start does, the mutation positions will not be integers and the tree sequence will not be SLiM-compliant, right?
Oh! Good point! And my code above makes the same mistake! (I'm editing it above to add discrete
to the arguments.) But, no - pyslim doesn't check if coordinates are integer-valued. Maybe it should.
...But, no - pyslim doesn't check if coordinates are integer-valued. Maybe it should.
Not sure; I mean, the user will get an error on the SLiM side, so maybe there's no need to push that check back into pyslim as well.
@petrelharp @bhaller - discrete_genome=True
is the default in msprime for sim_mutations
and sim_ancestry
, so no need to provide that argument (if discrete genomes is what you want).
@petrelharp @bhaller -
discrete_genome=True
is the default in msprime forsim_mutations
andsim_ancestry
, so no need to provide that argument (if discrete genomes is what you want).
Oh, I didn't realize you guys had changed over to discrete by default! Wow; out of curiosity, what did you do about backward compatibility with that change? Seems like a big break with the past!
Two new functions sim_ancestry
and sim_mutations
, with sensible defaults (hopefully). The old functions simulate
and mutate
have identical behaviour as before and will be maintained indefinitely.
Oh, right! =)
Sorry, I closed the issue by accident.
@petrelharp The code listing you gave doesn't solve the issue, because I'm intending to use the tree sequence as input to a forward simulation in SLiM. If I dump the tree sequence with mutations included without first running pyslim.annotate_defaults
, SLiM won't accept it ("no SLiM provenance table entry found; this file cannot be read") and if I try to run pyslim.annotate_defaults
following ms_prime.sim_mutations
, I get the issue I raised above.
if I try to run pyslim.annotate_defaults following ms_prime.sim_mutations, I get the issue I raised above.
Notice that in the code above annotate_defaults
comes before sim_mutations
?
if I try to run pyslim.annotate_defaults following ms_prime.sim_mutations, I get the issue I raised above.
Notice that in the code above
annotate_defaults
comes beforesim_mutations
?
Of course, but if I dump the tree after running sim_mutations
on it then SLiM doesn't accept it.
Hah, my mistake: notice in my code above I mutated ts
not slim_ts
. Here's a full example.
test.py
:
import msprime, pyslim
ts = msprime.sim_ancestry(samples=100, population_size=100, sequence_length=1e8, recombination_rate=1e-8)
slim_ts = pyslim.annotate_defaults(ts, model_type="WF", slim_generation=1)
mts = msprime.sim_mutations(slim_ts,
rate=1e-7,
model=msprime.SLiMMutationModel(type=0),
discrete_genome=True
)
mts.dump("input.trees")
and restart.slim
:
initialize() {
initializeMutationRate(1e-7);
// m1 mutation type: neutral
initializeMutationType("m0", 0.5, "f", 0.0);
// g1 genomic element type: uses m1 for all mutations
initializeGenomicElementType("g1", m0, 1.0);
// uniform chromosome of length 100 kb with uniform recombination
initializeGenomicElement(g1, 0, 99999999);
initializeRecombinationRate(1e-8);
}
// create a population of 500 individuals
1 {
sim.readFromPopulationFile("input.trees");
catn("Read in!! Population p0 has " + p0.individualCount + " individuals.");
}
Then, I get:
$ python3 test.py && slim restart.slim
// Initial random seed:
2773097996638
// RunInitializeCallbacks():
initializeMutationRate(1e-07);
initializeMutationType(0, 0.5, "f", 0);
initializeGenomicElementType(1, m0, 1);
initializeGenomicElement(g1, 0, 99999999);
initializeRecombinationRate(1e-08);
// Starting run at generation <start>:
1
#WARNING (SLiMSim::ExecuteMethod_readFromPopulationFile): readFromPopulationFile() should probably not be called from an early() event in a WF model; fitness values will not be recalculated prior to offspring generation unless recalculateFitness() is called.
Read in!! Population p0 has 100 individuals.
I'll edit the example above so it actually works.
I'm now having annotate_defaults( )
throw an error if site positions are not all at integer values:
"Site positions in this tree sequence are not at integer values, "
"but must be for loading into SLiM: generate mutations with "
"sim_mutations(..., discrete_genome=True), not simulate()."
I ran the following code (a slight modification of code in the SLiM manual, p. 429):
The code flow for
pyslim.annotate_defaults
eventually ends up in the functionpack_bytes(data)
intskit/util.py
, where the following code raises an exception:One step above in the stack trace, one sees that `data' is a list of strings:
Indeed, in Python 3 you can't turn a string into a bytearray without specifying an encoding.
The fix should be simple. But I wonder, how come this hasn't been noticed before?