whitlock / OutFLANK

A procedure to find Fst outliers based on an inferred distribution of neutral Fst
18 stars 9 forks source link

Problems with the number of populations #8

Open katymoo opened 7 years ago

katymoo commented 7 years ago

Dear Micheal and Katie,

I am trying to run OutFlank on my SNP dataset and am running into a problem with setting the number of populations. I think there must be something wrong with the way I am specifying the population names, but I can't figure out what I'm doing wrong. I have 10 populations and 132 sampled individuals.

I tried using the following script:

SNPmat <- read.table("SNPmat.txt") locusNames <- seq(from=1, to= 108586, by=1) popNames <- c(replicate(22,"CAM"), replicate(2,"GAM"),replicate(9,"KAS"),replicate(13,"LOB"), replicate(9,"LOP"),replicate(19,"MCR"),replicate(10,"MBD"),replicate(14,"MIN"),replicate(22,"NDI"), replicate(12,"TAK")) FstDataFrame <- MakeDiploidFSTMat(SNPmat,locusNames,popNames) OutFLANK(FstDataFrame, LeftTrimFraction=0.05,RightTrimFraction=0.05, Hmin=0.1, 10,qthreshold=0.05)

However I get the following error message: Error in optim(NumberOfSamples, localNLLAllData, lower = 2, method = "L-BFGS-B") : L-BFGS-B needs finite values of 'fn' In addition: Warning messages: 1: In IncompleteGammaFunction(df/2, df HighTrimPoint/(2 Fstbar)) : value out of range in 'gammafn' 2: In IncompleteGammaFunction(df/2, df LowTrimPoint/(2 Fstbar)) : value out of range in 'gammafn'

When I change the number of sampled populations to 132 (which is the number of individuals), the error message goes away. So OutFLANK(FstDataFrame, LeftTrimFraction=0.05,RightTrimFraction=0.05, Hmin=0.1, 132,qthreshold=0.05) runs fine, but then OutFlank is presumably seeing each individual as a sampled population so the results would be meaningless.

My popNames look like this:

[1] "CAM" "CAM" "CAM" "CAM" "CAM" "CAM" "CAM" "CAM" "CAM" "CAM" "CAM" "CAM" [13] "CAM" "CAM" "CAM" "CAM" "CAM" "CAM" "CAM" "CAM" "CAM" "CAM" "GAM" "GAM" [25] "KAS" "KAS" "KAS" "KAS" "KAS" "KAS" "KAS" "KAS" "KAS" "LOB" "LOB" "LOB" [37] "LOB" "LOB" "LOB" "LOB" "LOB" "LOB" "LOB" "LOB" "LOB" "LOB" "LOP" "LOP" [49] "LOP" "LOP" "LOP" "LOP" "LOP" "LOP" "LOP" "MCR" "MCR" "MCR" "MCR" "MCR" [61] "MCR" "MCR" "MCR" "MCR" "MCR" "MCR" "MCR" "MCR" "MCR" "MCR" "MCR" "MCR" [73] "MCR" "MCR" "MBD" "MBD" "MBD" "MBD" "MBD" "MBD" "MBD" "MBD" "MBD" "MBD" [85] "MIN" "MIN" "MIN" "MIN" "MIN" "MIN" "MIN" "MIN" "MIN" "MIN" "MIN" "MIN" [97] "MIN" "MIN" "NDI" "NDI" "NDI" "NDI" "NDI" "NDI" "NDI" "NDI" "NDI" "NDI" [109] "NDI" "NDI" "NDI" "NDI" "NDI" "NDI" "NDI" "NDI" "NDI" "NDI" "NDI" "NDI" [121] "TAK" "TAK" "TAK" "TAK" "TAK" "TAK" "TAK" "TAK" "TAK" "TAK" "TAK" "TAK"

I have also tried creating an input file specifying the population names (see attached file) PopNames.txt

Then I used the following commands to try and run OutFLANK:

pops<-read.csv("PopNames.txt") FstDataFrame<-MakeDiploidFSTMat(SNPmat,locusNames,popNames=pops$x)

However I still have the same problem - OutFLANK only runs if I set the number of sampled populations to 132.

I am sure I'm making a silly little mistake but I really can't figure out what it might be. Any help would be greatly appreciated!

Thank you!

Katy

DrK-Lo commented 7 years ago

Hi Katy,I can’t reproduce your result without your SNP data, however I can tell you that the error is not in your pop names. Pop names is only used in the calculation of MakeDiploidFSTMat().You may be getting the error because you are not specifying NumberOfSamples in the OutFLANK function - in this case you have 9 populations and so you should set  NumberOfSamples=9. Note that the algorithm just uses this as a reasonable starting point. You should check the fit with the OutFLANKResultsPlotter().

katymoo commented 7 years ago

Hi, many thanks for your response!

I'm still really confused then, maybe something is incorrect in my SNPmat file. I have zipped it up and attached it here.

SNPmat2.txt.tar.gz

I am trying to set the NumberOfSamples to 10 (there are 10 populations, CM and GC are two separate pops) but I get the following error message: Error in optim(NumberOfSamples, localNLLAllData, lower = 2, method = "L-BFGS-B") The analysis only seems to run if I set NumberOfSamples to 132, which is the number of individuals rather than populations. So I'm still not sure what I'm doing wrong.

This is my code:

SNPmat <- read.table("SNPmat2.txt") locusNames <- seq(from=1, to= 108586, by=1) pops<-read.csv("PopNames.txt") FstDataFrame<-MakeDiploidFSTMat(SNPmat,locusNames,popNames=pops$x) OutFLANK(FstDataFrame, LeftTrimFraction=0.05, RightTrimFraction=0.05 ,Hmin=0.1, 10, qthreshold=0.05)

Many thanks for your help!

EveTC commented 4 years ago

Hi @DrK-Lo,

I am receiving the same error as @katymoo. Did anyone find a solution to this error?

I have made sure to include the number of populations in the NumberOfSamples argument as so:

sw_out <- OutFLANK(FstDataFrame=sw.Fs, LeftTrimFraction=0.05, RightTrimFraction=0.05, Hmin=0.1, NumberOfSamples=36, qthreshold=0.1)

but it outputs this error:

Error in optim(NumberOfSamples, localNLLAllData, lower = 2, method = "L-BFGS-B") :
  L-BFGS-B needs finite values of 'fn'

I also recieve the same error message when I set the argument to the number of individuals. Is there a limit to how many popualtion OutFLANK can handle?

Any help would be greatly appreciated! Thanks, Eve