thibautjombart / adegenet

adegenet: a R package for the multivariate analysis of genetic markers
166 stars 64 forks source link

glSim with observed allele frequencies and linkage structure #197

Open tjsmyser opened 6 years ago

tjsmyser commented 6 years ago

We are interested in describing rates of migration among genetically similar populations. Given that they are genetically similar, we would expect to falsely identify some individuals as migrants. We are interested in conducting a simulation in which allele frequencies reflect the described allele frequencies among the genetically similar observed populations. Specifically, we seek to quantify the frequency of falsely detecting a migrant under no migration and then subsequently increase the frequency of migration to observed patterns of genetic diversity. We are genotyping individuals with a HD SNP assay with a mapped genome. Is there a way we can create populations with specified allele frequencies (that also incorporate the linkage structure of SNP loci along chromosomes) using the glSim function? If glSim is not the appropriate tool, we are open to other software packages that may allow us to conduct simulations that are cast to closely reflect the observed data.

thibautjombart commented 6 years ago

Hi there, glSim will do the SNP simulation with LD, but for now you cannot specify the allele frequencies in the populations. I am curious: do the specified allele frequencies already include the LD patterns you are expecting? It would be relatively easy to modify the existing code to allow for allele frequencies to be user-specified. Welcome via PR.

tjsmyser commented 6 years ago

Thank you Thibaut,

To describe our problem a bit more clearly, we have run a DAPC to describe the population genetic structure for invasive wild pigs. We have identified a number of distinct genetic clusters. However, there is a large genetic cluster comprised of a number of discrete spatial locations that we know, based on movement data, are well beyond the dispersal capacity for the animal. We are interested in describing rates of migration among these ‘subpopulations.’ Given that these subpopulations are genetically similar, we would expect the number of identified migrants to be artificially inflated if we were to run a typical migration analysis. Conversely, if the gene pool of a given subpopulation is comprised of a mix of both resident and migrant individuals, we would expect the probability of migrant origin for any one individual to be deflated. Thus, we were hoping to develop a simulation protocol, perhaps using glSim, to accurate quantify migration rates among genetically similar subpopulations in which gene flow is ongoing. We would be open to developing a collaboration if this is a question that may be of interest to you.

Sincerely, Tim Smyser