popsim-consortium / stdpopsim

A library of standard population genetic models
GNU General Public License v3.0
122 stars 86 forks source link

Unable to verify if simulation is using specified model #1435

Open sidAvad opened 1 year ago

sidAvad commented 1 year ago

I've been trying to run a coalescent simulation using a standard model from the library - but instead of the default Hudson model, I want the simulation to use the discrete-time-Wright-Fisher model for the most recent 10 generations. So the following command seems to work

python -m stdpopsim --msprime-change-model 10 dtwf HomSap -g HapMapII_GRCh38 -c $chrom -o ${SOURCE_DIR}/OutOfAfrica_4J17_${chrom}.ts -d OutOfAfrica_4J17 YRI:2 CEU:3

Unfortunately, the output doesn't really tell me wether the --msprime-change-model option has actually been implemented. In fact I get the exact same log output when I run it with or without that option. It looks like this

Simulation information: Engine: msprime (1.2.0) Model id: OutOfAfrica_4J17 Model description: 4 population out of Africa Population: number_samples (sampling_time_generations): YRI: 2 (0) CEU: 3 (0) CHB: 0 (0) JPT: 0 (0) Contig Description: Contig origin: chr1:0-248956422 Contig length: 248956422.0 Contig ploidy: 2 Mean recombination rate: 1.1523470111585671e-08 Mean mutation rate: 1.44e-08 Genetic map: HapMapII_GRCh38

Is there any way for me to verify that the dtwf model is actually being used for the last 10 generations ?

nspope commented 1 year ago

Good question! This should be in the provenance associated with the tree sequence. The provenance from msprime is recorded first, and running

import tskit
ts = tskit.load("my_tree_sequence.ts")
ts.provenance(0)

in python will print it out. In this case, part of that provenance record will contain,

"model": [{"duration": 10.0, "__class__": "msprime.ancestry.StandardCoalescent"}, {"duration": null, "__class__": "msprime.ancestry.DiscreteTimeWrightFisher"}]

This indicates that the model being simulated is the standard coalescent ("hudson" model) for 10 generations, followed by DTWF afterwards -- which is the opposite of what you want!

Instead, if you do:

python -m stdpopsim --msprime-model dtwf --msprime-change-model 10 hudson \
   HomSap -g HapMapII_GRCh38 -c 22 -o foo.ts -d OutOfAfrica_4J17 YRI:2 CEU:3

echo '
import tskit
ts = tskit.load("foo.ts")
print(ts.provenance(0))
' | python

then you'll see provenance that contains:

"model": [{"duration": 10.0, "__class__": "msprime.ancestry.DiscreteTimeWrightFisher"}, {"duration": null, "__class__": "msprime.ancestry.StandardCoalescent"}]

which looks like the correct order. (It should also run a lot quicker if DTWF is only used for 10 generations instead of from 10 -> inf)

sidAvad commented 1 year ago

Oh I had that completely backwards - thank you so much !