Closed yzliu01 closed 11 months ago
Hi @yzliu01,
Thanks for reaching out. To specify a demography with size changes, specify a Demography
object like this:
demography = msprime.Demography()
demography.add_population(name="A", initial_size=10_000)
demography.add_population_parameters_change(0, initial_size=1000, growth_rate=growth_r1, population="A")
demography.add_population_parameters_change(100, initial_size=5000, growth_rate=growth_r2, population="A")
demography.add_population_parameters_change(200, initial_size=10000, population="A", growth_rate=0)
You can then use a demographydebugger
:
debug = demography.debug() print(debug)
, to verify whether the specified model corresponds to what you intended.
Alternatively, you could convert the demography
object to a demes.graph
using demography,to_demes()
and use demesdraw
to plot a visual representation (see here).
Then simulate with:
msprime.sim_ancestry(
samples=30,
sequence_length=10e6,
demography=demography,
recombination_rate=1e-8
)
I hope this resolves your issue. If so, I will move this issue to the discussion section.
Hi @GertjanBisschop, Thank you so much for your quick response. Oh, yes it should be in the category of discussion. I had some tests as below with your suggestions. When I output simulation results to a vcf file I could not make it with msprime.sim_ancestry() method. The generated vcf file is empty but the vcf header lines. With the old method, msprime.simulate(), it worked but it is deprecated. I have read through the online manual but did not see such related output instructions. Can you give some tips on the correct vcf output?
demography = msprime.Demography()
demography.add_population(name="A", initial_size=1000)
demography.add_population_parameters_change(time=100, growth_rate=0.03, population="A")
demography.add_population_parameters_change(time=200, growth_rate=-0.04,population="A")
demography.add_population_parameters_change(time=300, growth_rate=0.01, population="A")
demography.add_population_parameters_change(time=500, growth_rate=-0.04, population="A")
demography.add_population_parameters_change(time=600, growth_rate=0, population="A")
demography.debug()
# You can then use a demographydebugger:
debug = demography.debug()
print(debug)
# to verify whether the specified model corresponds to what you intended.
# Alternatively, convert the demography object to a demes.graph using demography,to_demes()and use demesdraw to plot a visual representation.
import demesdraw
import matplotlib.pyplot as plt
#graph = demography.to_demes()
graph = msprime.Demography.to_demes(demography)
fig, ax = plt.subplots() # use plt.rcParams["figure.figsize"]
demesdraw.tubes(graph, ax=ax, seed=1)
plt.show()
# Then simulate with:
ts = msprime.sim_ancestry(
samples=30,
sequence_length=10e6,
demography=demography,
ploidy=2,
recombination_rate=1e-8
)
type(ts)
print(ts)
# output simulated data in vcf format
# diploid
def write_vcf_dip(ts, output):
with open(output, "w") as vcf_file:
ts.write_vcf(vcf_file,ploidy=2)
# issue to output data in vcf (no data obtained except header lines of vcf format)
write_vcf_dip(ts, "one_pop_msprime_sim_output_file.vcf")
# With this, I got an arror: ValueError: Cannot specify ploidy when individuals are present in tables
# only able to output vcf-like data with below old method in my previous test, not compatible with 'demography' parameter
msprime.simulate()
Apologies, I forgot to mention that with the new msprime API we have decoupled the ancestry simulation from simulating mutations.
So, after simulating the ancestry (msprime.sim_ancestry()
), you'll have to sprinkle mutations along the tree sequence by running something like mts = msprime.sim_mutations(ts, rate=1e-8)
. Take a look here for details. There are multiple mutation models to choose from.
Also, as you noticed from the error when generating the vcf. There is no need to specify ploidy here, the ploidy is already specified in the sim_ancestry
step. This is an optional argument with default 2.
Thanks a lot. That's really helpful. For the simulation of multiple chromosomes, what I read is that you suggest independent simulations, which I understand can be combined later, right?
Hi all, I wonder if there is a template to simulate multiple bottlenecks for a population over several time points. I have made a simple script to do this but it seems not correct. Can anyone give any advice? Below is my python script.