whitlock / OutFLANK

A procedure to find Fst outliers based on an inferred distribution of neutral Fst
18 stars 9 forks source link

Error: in if (s2 == 0) { : missing value where TRUE/FALSE needed #20

Open peterinnes opened 5 years ago

peterinnes commented 5 years ago

Dear Katie and Michael,

I'm hitting an error when using the MakeDiploidFSTMat() function and am hoping you can help me out. The error is: Calculating FSTs, may take a few minutes... Error in if (s2 == 0) { : missing value where TRUE/FALSE needed

Based on this, the error seems to be coming from the WC_FST_Diploids_2Alleles() function, specifically the following lines: s2 = sum(sample_sizes*(p_freqs - p_ave)^2)/((n_pops-1)*n_ave) if(s2==0){return(0); break}

In my case I think s2 has a value of NA, thus the error "missing value where TRUE/FALSE needed". Do you think this is the case? If so, do you have any idea where this NA is coming from/how to fix this error?

I've attached my data, locinames, and population list below.

Thanks, Peter

OutFLANK_data.txt.gz

OutFLANK_locinames.txt.gz

OutFLANK_Pop_list.txt

Giov12 commented 5 years ago

Did you ever figure this out? I am now running into this issue myself.

peterinnes commented 5 years ago

No, I haven't figured it out. Kinda gave up on it for the time being. I'll write here otherwise.

hennelly commented 4 years ago

I'm also running into this issue. I'll post if I figure it out, or I would be interested in anyone has came up with a solution!

marimmt777 commented 4 years ago

I'm running outFLANK and got the same issue. I've tried with the function 'gl.outflank' and with 'WC_FST_Diploids_2Alleles', and both resulted in the same problem. Don't know what else to do...

nek001 commented 4 years ago

Hi guys,

I've had this same error. I'm not sure why, but when I filtered my SNP data more stringently it resolved the issue.

Hope this helps! Cheers

lqch commented 3 years ago

I'm facing this issue, too! @nek001 what were the new filtering settings that you applied to your SNP data?

lqch commented 3 years ago

I think I solved it for my dataset by having more stringent missingness filtering.

This is especially so if you have uneven numbers of individuals per population, e.g. I have two populations with 10 individuals in one pop and 28 in the other. If there are genotypes which are totally missing in one pop, for example in the one with 10 individuals, it is not possible to calculate FST for that locus and this error is thrown. To solve it, make sure your missingness filter (e.g. bcftools filter -r 'F_MISSING> value') is set high enough so that there is no SNP where genotypes for all members of each population are missing, in my case minimum missingness of 10/38*100% = 0.27.

I hope this helps!!

nek001 commented 3 years ago

Thanks Le!

I ended up filtering missingness by population too (what parameters exactly, I can't remember).

However, I was just wondering about the implications of this - by filtering out SNPs that are missing in one population but are possibly fixed or high frequency in another could potentially be removing SNPs under selection or would these likely be SNPs associated with demographic processes?

lqch commented 3 years ago

Hey nek001,

Great question! I think they could certainly be due to either selection or demographic processes, but maybe outflank wouldn't be the best way to distinguish between the two... If you had linkage data, you might be able to detect selective sweeps at a locus that was under strong selection, whereas sites fixed due to demography may be more likely at recombination hotspots.

paulocecco commented 2 years ago

Has anyone solved this?

nek001 commented 2 years ago

Hi paulocecco,

Try removing SNPs that are absent from one population (when comparing two populations). I did managed to get it working by doing that, however, you may need to consider the consequences of this. I actually ended up using a different method for my research in the end.

paulocecco commented 2 years ago

Plink filter suppose to do that but I'm gonna do it manually. If I may ask nek001, what did you do then?

paulocecco commented 2 years ago

I just compared my .map files beteen the populations, they share all the same markers. So it's not a problem of marker difference, any help here?

ivan06513 commented 8 months ago

Hey everyone,

I also had the same error when I used gl.outflank.


Starting gl2gi 
  Processing genlight object with SNP data
Matrix converted.. Prepare genind object...
Completed: gl2gi 
Calculating FSTs, may take a few minutes...
[1] "10000 done of 67280"
[1] "20000 done of 67280"
Error in if (s2 == 0) { : missing value where TRUE/FALSE needed

Here I solved this error by removing NA in population like @lqch did. But I used dartR tools to fix it. I try genind format at first time, but it didn't work. So I convert the format to genlight then filtered again. I

Here are what I did:

gl2 <- gi2gl(vcf.genind)
gl2 <- gl.filter.allna(gl2, by.pop = T) 
outflnk = gl.outflank(gl2, qthreshold = 0.05, plot = FALSE)

Hope the scripts help those who meet the NA error! Thanks for good suggestions from everyone!

lholivera commented 6 months ago

Hi! I found the error. As Peteriness mentioned, the error occurs in that line of code (Fst Diploids.R line 37). I've traced this issue, and the problem lies in the function getFSTs_diploids (Fst Diploids.R line 116). In lines 119 and 120, the code removes elements with no call, and in some cases, there is a full population without values. Consequently, the Sample_Mat parameter of the WC_FST_Diploids_2Alleles function ends up with only one population, causing the formula of s2 to divide by zero. This is why applying more stringent missingness filtering, as suggested by lqch, may sometimes resolve the problem. However, the ultimate solution is to filter by population, as recommended by nek001, until the code can handle this scenario. Thanks to everyone!