population-genetic-modeling-dis / seaINVADERS

0 stars 2 forks source link

seaINVADERS 1.0

by Jason Selwyn, Martin French, and Chris Bird

An R script designed to simulate the most likely number of individuals introduced in a colonization event. This is an updated version of the simulation model published by Selwyn et al. 2017.

Improvements

How to run: 1.) Configure parameters.R file a.) estimate source pop theta (k) using arlequin and/or b.) provide a vector of haplotype frequencies from a source population c.) provide a vector of haplotype frequencies from the destination population and d.) the estimated gene diversity of the destination pop a.) if you have thoroughly sampled the destination pop, then you don't have to do this e.) provide standard demographic parameters a.) don't sweat this too hard, this will control pop growth rate and you can specify a constant to modulate the growth rate b.) the faster the population grows, the less chance genetic drift has to change allele frequencies

2.) Configure the model.sh file for your supercomputer
    a.) you can also run on a workstation, but you'll have to
        mimick what happens in the model.sh file

3.) On a supercomputer, use sbatch to submit the job to your 
    slurm scheduler
    $ sbatch model.sh <parameters file>

Outputs

Assumptions: 1.) No Allee effects 2.) No elevated mortality in the colonizers 3.) No variance in reproductive success 4.) Logistic growth - enforced on recruits per individual 5.) Single introduction 6.) Genetic data is assumed to be haploid (mtDNA) 7.) Locus obeys infinite alleles model 8.) Source population is in mutation-drift equilibrium 9.) Deterministic, monthly reproduction 10.) Colonists are adults 11.) PLD <= 30 days 12.) Reproductive maturity occurs at same age for all

How it works: If source pop is defined by estimate of theta, the Ewan's sampling distribution is used to draw a sample of individuals from the source population. Otherwise, the vector of haplotypes from the source population is assumed to adequately represent the genetic diversity of the whole population and the sample of colonizers is drawn randomly using the multinomial distribution.

The colonizers start reproducing each month. Eggs are produced
by the user-specified proportion of females and females eggs
are tracked.  Egg and larval mortality occurs in the first
month, resulting in a given number of recruits per individual
(RPI).  The RPI is multiplied by a used specified constant to
adjust growth rate. Recruits become juveniles and monthly 
cohorts are subjected to a user-specified mortality rate.
Juveniles become adults after a user-specified number of months.
Adults are also subjected to a different, user-specified 
mortality rate.  Logistic growth is achieved by linearly 
decreasing the RPI as the number of adults approaches the user-
specified carrying capacity (adults). The total population size
can be larger than the carryign capacity because eggs, larvae,
and juveniles are not counted towards carrying capacity.

From a population genetic stand point, reproduction occurs
according to the discrete Wright-Fisher model. Haplotypes are
drawn from an infinite pool of eggs.  Mortality stochastically
removes haplotypes from the population using the binomial 
distribution, with differnt rates of mortality for eggs, larvae,
juveniles, and adults.  

After a user-specified amount of time (usually the time from 
introduction to genetic sampling of the destination population),
the model is stopped and a genetic sample equal in size to that
of the vector of observed haplotypes from the destination 
population.  The haplotype richness and sample-size corrected
gene diversity (see Arlequin manual) are calculated from the 
simulated sample.  If the both the haplotype richness equals
the observed and the gene diversity falls within the 95% CI
of the observed, then it is recorded as a match.

The simulation is repeated as specified by the user, iterating
through bootstrap replicates for each number of colonists being
tested.  The number of colonists with the highest proportion of
matches to the observed population (i.e. highest joint 
likelihood) is deemed the most likely to have produce the
observed genetic diversity given the settings and the implicit
model assumptions.

The user can iterate over multiple source population diversities
and destination population growth rates to test senstivity.

Notes The RData file produced can be exceedingly large. Setting the thin variable to TRUE only keeps the first and last month of the simulation. Otherwise, all months are stored in the RData file. To obtain population size plots for every month, set thin to TRUE (the user might want to set a small number of bootstrap replicates).