whitlock / OutFLANK

A procedure to find Fst outliers based on an inferred distribution of neutral Fst
18 stars 9 forks source link

Error in quantile.default(pi0, prob = 0.1) : missing values and NaN's not allowed if 'na.rm' is FALSE #27

Open CelineReisser opened 3 years ago

CelineReisser commented 3 years ago

Hi,

With my colleague, we are trying to use OUTFLANKS on a set of 30000 loci in 18 samples. We prepared the input, selected the pruned loci, and all goes well until we reach the outflanks function, where we get the following error message:

out_trim <- OutFLANK(FstDataFrame=my_fst[which_pruned,], LeftTrimFraction=0.05, RightTrimFraction=0.05,NumberOfSamples=18, qthreshold = 0.05). Error in quantile.default(pi0, prob = 0.1) : missing values and NaN's not allowed if 'na.rm' is FALSE

We trimmed our VCF of all NA genotypes, and we selected a MAF>0.15.

I am not sure of what is happening here. I tried to look at the source codes for the different functions used in the outflanks function, but couldn't identify the source of the problem.

Any ideas?

Thank you very much for any help.

Celine

DrK-Lo commented 3 years ago

Hi Celine, This is a Q-value error. I think I've seen it before when the distribution perfectly fits a chi-square distribution. You could check it by simulating a random variable under the chi-squared distribution and plugging it into the q-value function and seeing if it gives the same error.


From: Celine M.O. Reisser @.> Sent: 10 March 2021 05:24 To: whitlock/OutFLANK @.> Cc: Subscribed @.***> Subject: [whitlock/OutFLANK] Error in quantile.default(pi0, prob = 0.1) : missing values and NaN's not allowed if 'na.rm' is FALSE (#27)

Hi,

With my colleague, we are trying to use OUTFLANKS on a set of 30000 loci in 18 samples. We prepared the input, selected the pruned loci, and all goes well until we reach the outflanks function, where we get the following error message:

out_trim <- OutFLANK(FstDataFrame=my_fst[which_pruned,], LeftTrimFraction=0.05, RightTrimFraction=0.05,NumberOfSamples=18, qthreshold = 0.05). Error in quantile.default(pi0, prob = 0.1) : missing values and NaN's not allowed if 'na.rm' is FALSE

We trimmed our VCF of all NA genotypes, and we selected a MAF>0.15.

I am not sure of what is happening here. I tried to look at the source codes for the different functions used in the outflanks function, but couldn't identify the source of the problem.

Any ideas?

Thank you very much for any help.

Celine

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fwhitlock%2FOutFLANK%2Fissues%2F27&data=04%7C01%7Ck.lotterhos%40northeastern.edu%7Cf8e84be5fd8e473c63b308d8e3aebace%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C637509686888129618%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ht4hSfasZeoh2ih6XBkIep4H3TSmpVciR%2BuUbBRw97Y%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABUNI3L4LOKIEYVV2Q6IPRDTC5CG5ANCNFSM4Y5V323Q&data=04%7C01%7Ck.lotterhos%40northeastern.edu%7Cf8e84be5fd8e473c63b308d8e3aebace%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C637509686888129618%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=b7GPMqXdZVUm44SGBdOidorPBnhUya6aSdGg5mSPfJU%3D&reserved=0.

CelineReisser commented 3 years ago

Hi there,

Thank you for the very quick answer.

I tried as mentionned to generate a random variable containing 100000 values using the rchisq function, and then submit it to qvalue function, and it worked... So this might not be the reason?

We however just found out a weird behavior: We have two large datasets of 11 million SNPs (with missing values) and 6 million SNPs (with no missing values, as we saw that bigsnpr does not handle them properly). We created a subset of each file containing 30,000 SNPs for testing purposes. The outflanks function works on the dataset containing missing data, but not on the one without NA... Everything in those files are identical, except that there is no NA in the latter.

We visualized the R objects created along the pipeline, and they look identical to each other, the Fst calculation goes well for both, only the outflanks function does not work...

DrK-Lo commented 3 years ago

Do any of the SNPs being input into the OutFLANK function have an NA for FST?


From: Celine M.O. Reisser @.> Sent: 10 March 2021 09:41 To: whitlock/OutFLANK @.> Cc: Lotterhos, Katie @.>; Comment @.> Subject: Re: [whitlock/OutFLANK] Error in quantile.default(pi0, prob = 0.1) : missing values and NaN's not allowed if 'na.rm' is FALSE (#27)

Hi there,

Thank you for the very quick answer.

I tried as mentionned to generate a random variable containing 100000 values using the rchisq function, and then submit it to qvalue function, and it worked... So this might not be the reason?

We however just found out a weird behavior: We have two large datasets of 11 million SNPs (with missing values) and 6 million SNPs (with no missing values, as we saw that bigsnpr does not handle them properly). We created a subset of each file containing 30,000 SNPs for testing purposes. The outflanks function works on the dataset containing missing data, but not on the one without NA... Everything in those files are identical, except that there is no NA in the latter.

We visualized the R objects created along the pipeline, and they look identical to each other, the Fst calculation goes well for both, only the outflanks function does not work...

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fwhitlock%2FOutFLANK%2Fissues%2F27%23issuecomment-795518211&data=04%7C01%7Ck.lotterhos%40northeastern.edu%7C2d18af40e54f48553f3508d8e3d28d23%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C637509840737389065%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=PTbNcJptSJpXCYFB0WDXdHuAXxDcFWIyALJpHiwTWA4%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABUNI3OXXAPK572UFZPEYVDTC6AIRANCNFSM4Y5V323Q&data=04%7C01%7Ck.lotterhos%40northeastern.edu%7C2d18af40e54f48553f3508d8e3d28d23%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C637509840737389065%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=iGpkCTFU5HT2z8SDZA7jruJN3tVZLp3Q2vaoySp%2Blbo%3D&reserved=0.

CelineReisser commented 3 years ago

Apparently no, I did the following command:

table(is.na(my_fst$FST))

FALSE 30000

yvanpapa commented 3 years ago

Hi, am encountering the same error when using the wrapper for outflank as implemented in DARTR. gl.outflank(gl_new,plot=T,na.rm=T)->outflank Error in quantile.default(pi0, prob = 0.1) : missing values and NaN's not allowed if 'na.rm' is FALSE Is the origin of this error still unknown?

CelineReisser commented 3 years ago

We still don't know on our side.

We have been working around the problem using other packages to do the outlier detection, and I wanted to come back to it in the next few weeks to try and understand it better. But it seems the error is generated by the package q-value...

jpfontenelle commented 2 years ago

Heya. Anyone found a solution to this? I get the same error as people above

CelineReisser commented 2 years ago

Hi there, No solution so far.

jpfontenelle commented 2 years ago

I've been playing around a bit with it and while I still can´t figure out a way to pass na.rm=T to the quantile() function that is called internally by OutFLANK, I could "hack" it by playing with the LeftTrimFraction and RightTrimFraction parameters. Mostly by passing higher values than the default ones. Might be worth a try, since it appears to be dataset related. Not ideal, but it is something.

Afei99357 commented 11 months ago

does anyone figure out the issue? I also have the same error so far.