tskit-dev / msprime

Simulate genealogical trees and genomic sequence data using population genetic models
GNU General Public License v3.0
173 stars 85 forks source link

Example to simulate multiple bottlenecks for a population over several time points #2213

Closed yzliu01 closed 11 months ago

yzliu01 commented 11 months ago

Hi all, I wonder if there is a template to simulate multiple bottlenecks for a population over several time points. I have made a simple script to do this but it seems not correct. Can anyone give any advice? Below is my python script.


growth_r2=-0.0006
T_growth1 = 300 ### last 300 generations of exponential growth
T_growth2 = 1000
Size_before_growth1 = 1e4
Size_before_growth2 = 1e6
### Calculate the population size after exponential growth.
Size_after_growth1 = Size_before_growth1 / math.exp(-growth_r1 * T_growth1)
Size_after_growth2 = Size_before_growth2 / math.exp(-growth_r2 * T_growth2)
yy = msprime.simulate(
    Ne=1e4,
    recombination_rate=1e-8,
    mutation_rate=1e-8,
    length = 5e6, # 10MB 
    population_configurations = [
        # POP1 Specify population size at time point 0 to be the calculated size after growth
        msprime.PopulationConfiguration(sample_size=30, growth_rate=growth_r1, initial_size=Size_after_growth1),
    ],
    migration_matrix =[
        [0],
    ],
    demographic_events = [
        ### Set population size to 1e4 prior to exponential growth start and set the growth rate prior to this time to 0
        msprime.PopulationParametersChange(time=T_growth1, initial_size=Size_before_growth1, growth_rate=0),
        msprime.PopulationParametersChange(time=T_growth2, initial_size=Size_before_growth2, growth_rate=-0.006)
    ],
    random_seed=random.seed())```

Thanks a lot in advance.
Best,
yz
GertjanBisschop commented 11 months ago

Hi @yzliu01, Thanks for reaching out. To specify a demography with size changes, specify a Demography object like this:

demography = msprime.Demography()
demography.add_population(name="A", initial_size=10_000)
demography.add_population_parameters_change(0, initial_size=1000, growth_rate=growth_r1, population="A")
demography.add_population_parameters_change(100, initial_size=5000, growth_rate=growth_r2, population="A")
demography.add_population_parameters_change(200, initial_size=10000, population="A", growth_rate=0)

You can then use a demographydebugger: debug = demography.debug() print(debug), to verify whether the specified model corresponds to what you intended. Alternatively, you could convert the demography object to a demes.graph using demography,to_demes()and use demesdraw to plot a visual representation (see here).

Then simulate with:

msprime.sim_ancestry(
    samples=30, 
    sequence_length=10e6,
    demography=demography,
    recombination_rate=1e-8
)

I hope this resolves your issue. If so, I will move this issue to the discussion section.

yzliu01 commented 11 months ago

Hi @GertjanBisschop, Thank you so much for your quick response. Oh, yes it should be in the category of discussion. I had some tests as below with your suggestions. When I output simulation results to a vcf file I could not make it with msprime.sim_ancestry() method. The generated vcf file is empty but the vcf header lines. With the old method, msprime.simulate(), it worked but it is deprecated. I have read through the online manual but did not see such related output instructions. Can you give some tips on the correct vcf output?

demography = msprime.Demography()
demography.add_population(name="A", initial_size=1000)
demography.add_population_parameters_change(time=100, growth_rate=0.03, population="A")
demography.add_population_parameters_change(time=200, growth_rate=-0.04,population="A")
demography.add_population_parameters_change(time=300, growth_rate=0.01, population="A")
demography.add_population_parameters_change(time=500, growth_rate=-0.04, population="A")
demography.add_population_parameters_change(time=600, growth_rate=0, population="A")
demography.debug()

# You can then use a demographydebugger:
debug = demography.debug()
print(debug)
# to verify whether the specified model corresponds to what you intended.
# Alternatively, convert the demography object to a demes.graph using demography,to_demes()and use demesdraw to plot a visual representation.

import demesdraw
import matplotlib.pyplot as plt
#graph = demography.to_demes()
graph = msprime.Demography.to_demes(demography)
fig, ax = plt.subplots()  # use plt.rcParams["figure.figsize"]
demesdraw.tubes(graph, ax=ax, seed=1)

plt.show()

# Then simulate with:
ts = msprime.sim_ancestry(
    samples=30, 
    sequence_length=10e6,
    demography=demography,
    ploidy=2,
    recombination_rate=1e-8
)

type(ts)
print(ts)

# output simulated data in vcf format
# diploid
def write_vcf_dip(ts, output):
    with open(output, "w") as vcf_file:
        ts.write_vcf(vcf_file,ploidy=2)
# issue to output data in vcf (no data obtained except header lines of vcf format)
write_vcf_dip(ts, "one_pop_msprime_sim_output_file.vcf")
# With this, I got an arror: ValueError: Cannot specify ploidy when individuals are present in tables 

# only able to output vcf-like data with below old method in my previous test, not compatible with 'demography' parameter
msprime.simulate()
GertjanBisschop commented 11 months ago

Apologies, I forgot to mention that with the new msprime API we have decoupled the ancestry simulation from simulating mutations. So, after simulating the ancestry (msprime.sim_ancestry()), you'll have to sprinkle mutations along the tree sequence by running something like mts = msprime.sim_mutations(ts, rate=1e-8). Take a look here for details. There are multiple mutation models to choose from. Also, as you noticed from the error when generating the vcf. There is no need to specify ploidy here, the ploidy is already specified in the sim_ancestry step. This is an optional argument with default 2.

yzliu01 commented 11 months ago

Thanks a lot. That's really helpful. For the simulation of multiple chromosomes, what I read is that you suggest independent simulations, which I understand can be combined later, right?