popsim-consortium / stdpopsim

A library of standard population genetic models
GNU General Public License v3.0
121 stars 86 forks source link

orangutan model how to add adaptive introgression? #1554

Open h-pawar opened 3 months ago

h-pawar commented 3 months ago

Short description of the problem:

Hello, building on the orangutan “TwoSpecies_2L11” demographic model, I would like to add adaptive introgression (from bornean into sumatran). Mutation arises any time from the species split to 1000 generations ago, passes to sumatran & is beneficial with s=0.1.

Following the tutorial - https://github.com/popsim-consortium/stdpopsim/blob/main/docs/selection_example.py, I

Would you expect this model to take such a long time to run (~6 days)? Or have I specified the introgression pulse incorrectly, does this need a separate call to be added?

Is there a way to check that the mutation has introgressed & correctly undergone selection in the recipient population?

How to reproduce the problem:

#!/usr/bin/python
#Pongo, adaptive introgression : test bornean into sumatran
#adaptive introgression, following https://github.com/popsim-consortium/stdpopsim/blob/main/docs/selection_example.py
import stdpopsim
import tskit
import random
import numpy
import argparse

parser = argparse.ArgumentParser()
parser.add_argument("rep", type=int, help="rep number")
parser.add_argument("seln", type=float, help="selection coeff")
args = parser.parse_args()
repnum = args.rep
selcoeff = args.seln
#------------------------------------------------------------------------------------------------------------------------
species = stdpopsim.get_species("PonAbe")
demography = species.get_demographic_model("TwoSpecies_2L11")
contig = species.get_contig(length=100000,mutation_rate=demography.mutation_rate)
engine = stdpopsim.get_engine("slim")
samples = {"Sumatran": 11, "Bornean": 14}
#------------------------------------------------------------------------------------------------------------------------
#time species split in generations -https://popsim-consortium.github.io/stdpopsim-docs/stable/catalog.html#sec_catalog_PonAbe_genome
T_Abe_Pyg_split = 20157
#migrations occur per generation in ponabe locke model

t_delta = 1000 / demography.generation_time
#time of mutation
T_mut = random.uniform(1000, T_Abe_Pyg_split)

#have mutation segregating in source pop for a few generations
T_mig = random.uniform(T_mut - 100, T_mut - 5)

#but need to ensure mutation is passed before selection acts *
T_sel = random.uniform(t_delta, T_mig)
s = selcoeff

print(f'Parameters of rep:{repnum}, selcoeff:{s}, T_mut:{T_mut}, T_mig:{T_mig}, T_sel:{T_sel}')
#------------------------------------------------------------------------------------------------------------------------
    # Place the drawn mutation in the middle of the contig.
locus_id = "introgressed_locus"
contig.add_single_site(id=locus_id, coordinate=round(contig.length / 2))

extended_events = [
        # Draw mutation in source pop
        stdpopsim.ext.DrawMutation(
            time=T_mut,
            single_site_id=locus_id,
            population="Bornean",
        ),
        # Because the drawn mutation is neutral at the time of introduction,
        # it's likely to be lost due to drift. To avoid this, we condition on
        # the mutation having AF > 0 in source pop. If this condition is false at any
        # time between start_time and end_time, the simulation will be
        # returned to the point where the mutation was introduced.
        # Conditioning should start one generation after T_mut (not at T_mut!),
        # to avoid checking for the mutation before SLiM can introduce it.
            # have mutation segregating in source pop for a few generations
        stdpopsim.ext.ConditionOnAlleleFrequency(
            # Note: if T_mut ~= T_Den_split, then we end up with:
            #       GenerationAfter(T_mut) < T_Den_split,
            #       which will give an error due to "start_time < end_time".
            start_time=stdpopsim.ext.GenerationAfter(T_mut),
            end_time=T_mig,
            single_site_id=locus_id,
            population="Bornean",
            op=">",
            allele_frequency=0,
        ),
        # The source lineage has migrants entering the recipient lineage at T_mig,
        # so condition on AF > 0 in recipient pop
        stdpopsim.ext.ConditionOnAlleleFrequency(
            start_time=stdpopsim.ext.GenerationAfter(T_mig),
            end_time=0,
            single_site_id=locus_id,
            population="Sumatran",
            op=">",
            allele_frequency=0,
        ),
        # The mutation is positively selected in recipient pop at T_sel.
        # Note that this will have no effect, unless/until a mutation with the
        # specified mutation_type_id is found in the population.
        stdpopsim.ext.ChangeMutationFitness(
            start_time=T_sel,
            end_time=0,
            single_site_id=locus_id,
            population="Sumatran",
            selection_coeff=s,
            dominance_coeff=1,
        ),
        # Condition on AF > 0.05 in recipient pop at the end of the simulation.
          # Q - here do not understand why start time should be 0??
        stdpopsim.ext.ConditionOnAlleleFrequency(
            start_time=0,
            end_time=0,
            single_site_id=locus_id,
            population="Sumatran",
            op=">",
            allele_frequency=0.05,
        )
    ]

#Simulate.
ts = engine.simulate(
    demography,
    contig,
    samples,
    extended_events=extended_events,
    slim_burn_in=10,
    )

filename_abe=f"/scratch/admixlab/harvi/slim_out/orang/ai/testai.intoabe.{repnum}.{selcoeff}.trees"
ts.dump(filename_abe)
petrelharp commented 1 week ago

Hello, @h-pawar! Sorry we didn't notice this earlier. I can't dig in deep here right now, but if this is still an issue:

Would you expect this model to take such a long time to run (~6 days)? Or have I specified the introgression pulse incorrectly, does this need a separate call to be added?

Well, how long does the model without adaptive introgression take to run? If it is much shorter than 6 days, probably what you are seeing is that the events you're conditioning on are very unlikely, so it is restarting a lot. I guess the way I would debug this is to start simple - so, remove some of the conditions and see if that helps? I would also do this debugging using a very strong rescaling factor, so it runs in minutes not days.

Is there a way to check that the mutation has introgressed & correctly undergone selection in the recipient population?

The short answer is: no, not an easy way? We do have pretty rigorous checks that these things function as they should, though. You could come up with ad-hoc ways like turn the selection coefficient way up and check that this increases the ending frequency?