whitlock / OutFLANK

A procedure to find Fst outliers based on an inferred distribution of neutral Fst
18 stars 9 forks source link

OulierFlag False #10

Open Mariabio87 opened 6 years ago

Mariabio87 commented 6 years ago

Dear Katie and Micheal,

I am trying to run OutFlank on my SNP dataset (attached). My script run without errors, however, I don't get any outlier from my data, as the outlierFlag column in my table is all FALSE. Here is my script:

sel <- read.table("./subOutliers.txt", head = TRUE)
sel=sel[vapply(sel, function(x) length(unique(x)) > 1, logical(1L))]
genotype <- sel[, 4:ncol(sel)]

ind <- paste("pop", sel[, 1]) # vector with the name of population
locinames <- as.character(seq(ncol(genotype))) # vector with the name of loci
FstDataFrame <- MakeDiploidFSTMat(genotype, locinames, ind)

OF <- OutFLANK(FstDataFrame, LeftTrimFraction=0.05, 
               RightTrimFraction=0.05, Hmin=0.1, NumberOfSamples=3, qthreshold=0.1)
outliers_OF <- OF$results$LocusName[OF$results$OutlierFlag == TRUE]
length(outliers_OF) # _I got 0 outliers here_

I will be really thankful if you could help me with this code. Probably there is something wrong with my dataset, but I can not figure out what it is

Thank you so much in advance. Maria

subOutliers.txt FstDataFrame.txt

whitlock commented 5 years ago

Hi Maria,

Sorry to take so long to get back to you. I'm digging my way through my neglected e-mail!

I just ran your data again. OutFLANK has been shown to have a much more appropriate Type I error rate than other programs, and as a consequence very often the answer is that it finds no outliers. Katie and I showed that other programs have overly high false positive rates, and often show outliers when they shouldn't. Looking at the distribution of results shows that your loci are all consistent with the neutral distribution.

OutFLANKResultsPlotter(OF, withOutliers = TRUE, NoCorr = TRUE, Hmin = 0.1, binwidth = 0.005, Zoom = FALSE, RightZoomFraction = 0.05, titletext = NULL)

Hope this helps, Mike

On Mar 19, 2018, at 9:48 AM, Mariabio87 notifications@github.com wrote:

Dear Micheal and Katie,

I am trying to run OutFlank on my SNP dataset (attached). My script run without errors however I don't get any outlier from my data, as the outlierFlag column in my table is all FALSE. Here is my script:

OF <- OutFLANK(FstDataFrame, LeftTrimFraction=0.05, RightTrimFraction=0.05, Hmin=0.1, NumberOfSamples=3, qthreshold=0.1) outliers_OF <- OF$results$LocusName[OF$results$OutlierFlag == TRUE] length(outliers_OF) # I got 0 outliers here I will be really thankful if you could help me with this code. Probably there is something wrong with my dataset, but I can not figure out what it is

Thank you so much in advance. Maria

FstDataFrame.txt https://github.com/whitlock/OutFLANK/files/1825937/FstDataFrame.txt — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/whitlock/OutFLANK/issues/10, or mute the thread https://github.com/notifications/unsubscribe-auth/AHW8w8SKJ_Q2jnpAcZts72lqwj2bxJQTks5tf-FjgaJpZM4SwhAe.

jcaccavo commented 1 year ago

Hi there,

I know it's been a few years since there's been activity on this issue, but I wanted to follow-up as I've come across a similar problem.

The dataset in question derives from whole-genome sequencing data, and has 3,309,311 total SNPs. The .vcf file used as input to OutFLANK can be downloaded from my Dropbox. The R script I used to process this dataset with OutFLANK can also be downloaded from my Dropbox here. In short, I used the following code for the primary analysis:

OutFLANK(FstDataFrame,LeftTrimFraction=0.05, RightTrimFraction=0.05, Hmin=0.1, NumberOfSamples=11, qthreshold=0.05)

I also tried the above code with NumerOfSamples=3.

Both NumberOfSamples values resulted in the same output: 2 outlier SNPs identified out of 3,309,311.

Using pcadapt, with the same dataset, I identified between 11,079 and 556,880 outlier SNPs (depending on the value of k and outlier cutoff method used).

I understand that OutFLANK reduces Type I error rate compared with other programs, but this seems to be an extreme difference, between 2 and 11,079.

I am also working with 3 other datasets from the same group of samples (with different coverage levels - this dataset in question was not downsampled to standardize coverage, so coverage varies from 1x - 27x, with n=41 samples). The other datasets have fewer samples (n=14, 25, 39 for 10x, 5x, and 2x downsampling, respectively). For these datasets I ran the same procedure in OutFLANK as for the non-downsampled dataset in question, and resulted with 0 outliers for all 3.

I produced all the possible plots in OutFLANK, but am not sure from their interpretation how to explain this large discrepancy between outlier identification with OutFLANK versus pcadapt.

Any insights you might have would be greatly appreciated. I'm happy to provide more information as needed.

Best, Jilda