whitlock / OutFLANK

A procedure to find Fst outliers based on an inferred distribution of neutral Fst
18 stars 9 forks source link

WC_FST_FiniteSample_*ploids_2AllelesB_No*Corr*() for multiple loci #6

Open l-yampolsky opened 8 years ago

l-yampolsky commented 8 years ago

Dear Mike and Katie,

I am trying to run OutFLANK for a large number of loci (obviously). So far I have only used it with my externally calculated values of $FST, $T1, $T2, $FSTNoCorr, $T1NoCorr and $T2NoCorr, but now I feel it would be convenient to use built-in functions to calculate these. I understand that the input for these functions is an array with data about a single locus, with a row for each population and a column for each genotype. I am trying to do it for 90K SNP loci; clearly I don't want to create 90K input files. I have no doubt that you also have a script, unix or R, to run these functions for multiple loci, preferably from a single file where populations and loci are identified by two index columns - yet I can't find it anywhere on your github site. Any suggestions?

Thanks!

Lev

DrK-Lo commented 8 years ago

Hi Lev, You can use the MakeDiploidFSTMat() function if you have diploid data. See the user manual for the format of the data. If you have haploid data, you will have to use a different function for the Fst calculation. (as described in the manual). We don't have a wrapper for this, but you can look at the code inside the MakeDiploidFSTMat() function to see how we implemented it. Let us know if this helps!

l-yampolsky commented 8 years ago

Thanks! No, I cannot use MakeDiploidFSTMat() for the data in question, because it's poolSeq data. So, it's 1) haploid and 2) no individuals, just allele frequencies. May be the easiest way would be to know exactly which versiond of the Fst formula are used to calculate numerator and denominator with and without correction. I would then continue to feed externally calculated Fst's into OutFLANK, but will be certain that I am doing the right thing. I've been doing this in the past, but I am worrying I don't have a benchmark to make sure I am not misusing OutFLANK.