Open dmacguigan opened 1 year ago
Can you check your code? It looks like in your first two lines of code, the right trim fraction is changed from 0.05 to 0.5
From: dmacguigan @.> Sent: 01 March 2023 17:16 To: whitlock/OutFLANK @.> Cc: Subscribed @.***> Subject: [whitlock/OutFLANK] Different RightTrimFraction settings give same p values (Issue #32)
If I understand correctly, modifying the RightTrimFraction or LeftTrimFraction parameters of the OutFLANK function changes the Fst values used to estimate the Fst distribution. Since the Fst distribution is different, we should expect different p-values.
When I change the LeftTrimFraction parameter, the right-tailed p-values are different.
outflank_results_1 <- OutFLANK(FstDataFrame = outflank_dat, LeftTrimFraction=0.05, RightTrimFraction=0.05, Hmin=0.1, NumberOfSamples=2, qthreshold=0.05)
outflank_results_2 <- OutFLANK(FstDataFrame = outflank_dat, LeftTrimFraction=0.05, RightTrimFraction=0.5, Hmin=0.1, NumberOfSamples=2, qthreshold=0.05)
plot(outflank_results_1$results$pvaluesRightTail,outflank_results_2$results$pvaluesRightTail) abline(0,1, col="red", lwd=2)
However, when I change the RightTrimFraction parameter, the resulting p-values are identical.
outflank_results_1 <- OutFLANK(FstDataFrame = outflank_dat, LeftTrimFraction=0.05, RightTrimFraction=0.05, Hmin=0.1, NumberOfSamples=2, qthreshold=0.05)
outflank_results_2 <- OutFLANK(FstDataFrame = outflank_dat, LeftTrimFraction=0.05, RightTrimFraction=0.75, Hmin=0.1, NumberOfSamples=2, qthreshold=0.05)
plot(outflank_results_1$results$pvaluesRightTail,outflank_results_2$results$pvaluesRightTail) abline(0,1, col="red", lwd=2)
I have attached my data (outflank_dat) if you would like to try replicating these results. outflank_dat.txthttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fwhitlock%2FOutFLANK%2Ffiles%2F10865685%2Foutflank_dat.txt&data=05%7C01%7Ck.lotterhos%40northeastern.edu%7C88e974ae8a6641a19e1908db1aa2a8de%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C638133058146994196%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=1noIjBv4QyR7oMBp85tJy3eOhMcOwycmROsmyQVPl3s%3D&reserved=0
Do you have any thoughts on why this is happening? I would like to adjust my RightTrimFraction parameter because I suspect there are many loci under selection in my dataset. But at the moment, adjusting that parameter has no effect.
— Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fwhitlock%2FOutFLANK%2Fissues%2F32&data=05%7C01%7Ck.lotterhos%40northeastern.edu%7C88e974ae8a6641a19e1908db1aa2a8de%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C638133058146994196%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=8z%2Fp5jHfiT07Ape%2B3dQvDhPbsQwMgXKgb9TDi91%2FerY%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABUNI3PPS5IVPDGTRETJT5LWZ7DFHANCNFSM6AAAAAAVMTY5OI&data=05%7C01%7Ck.lotterhos%40northeastern.edu%7C88e974ae8a6641a19e1908db1aa2a8de%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C638133058147150470%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=25VoQ4RNvFKbjdtKQrNJhMq6ktbQ%2BF0n3b%2FKYG9tQtI%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Yeah sorry, that was just a typo in my original post, not reflective of my code. I updated my post with the correct codes. Same result.
Thanks. Can you add a histogram of the distribution of FST values, and add the OutFLANKResultsPlotter histograms for each of the models.
From: dmacguigan @.> Sent: 02 March 2023 13:55 To: whitlock/OutFLANK @.> Cc: Lotterhos, Katie @.>; Comment @.> Subject: Re: [whitlock/OutFLANK] Different RightTrimFraction settings give same p values (Issue #32)
Yeah sorry, that was just a typo in my original post, not reflective of my code. I updated my post with the correct codes. Same result.
— Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fwhitlock%2FOutFLANK%2Fissues%2F32%23issuecomment-1452393719&data=05%7C01%7Ck.lotterhos%40northeastern.edu%7Cc8a7ad6a71be45798c6d08db1b4fa58e%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C638133801119183497%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=DMiLosQYXEYGaG%2Bsh9E2MqM%2BxRlMcYDzDg0OoAVEQCA%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABUNI3N7D2DQBDTNCG3DSX3W2DUI3ANCNFSM6AAAAAAVMTY5OI&data=05%7C01%7Ck.lotterhos%40northeastern.edu%7Cc8a7ad6a71be45798c6d08db1b4fa58e%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C638133801119183497%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=oaMZ%2BjsE0MLWQcrc1P1DWX6wfQU9eRA09SXg8f1eC2k%3D&reserved=0. You are receiving this because you commented.Message ID: @.***>
Can you also add for each model:
From: Lotterhos, Katie @.> Sent: 02 March 2023 16:11 To: whitlock/OutFLANK @.> Subject: Re: [whitlock/OutFLANK] Different RightTrimFraction settings give same p values (Issue #32)
Thanks. Can you add a histogram of the distribution of FST values, and add the OutFLANKResultsPlotter histograms for each of the models.
From: dmacguigan @.> Sent: 02 March 2023 13:55 To: whitlock/OutFLANK @.> Cc: Lotterhos, Katie @.>; Comment @.> Subject: Re: [whitlock/OutFLANK] Different RightTrimFraction settings give same p values (Issue #32)
Yeah sorry, that was just a typo in my original post, not reflective of my code. I updated my post with the correct codes. Same result.
— Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fwhitlock%2FOutFLANK%2Fissues%2F32%23issuecomment-1452393719&data=05%7C01%7Ck.lotterhos%40northeastern.edu%7Cc8a7ad6a71be45798c6d08db1b4fa58e%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C638133801119183497%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=DMiLosQYXEYGaG%2Bsh9E2MqM%2BxRlMcYDzDg0OoAVEQCA%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABUNI3N7D2DQBDTNCG3DSX3W2DUI3ANCNFSM6AAAAAAVMTY5OI&data=05%7C01%7Ck.lotterhos%40northeastern.edu%7Cc8a7ad6a71be45798c6d08db1b4fa58e%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C638133801119183497%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=oaMZ%2BjsE0MLWQcrc1P1DWX6wfQU9eRA09SXg8f1eC2k%3D&reserved=0. You are receiving this because you commented.Message ID: @.***>
Sure thing. Histogram of corrected Fst
hist(outflank_dat$FST)
First, the model with the default trim values, 0.05 for right and left.
outflank_results <- OutFLANK(FstDataFrame = outflank_dat,
LeftTrimFraction=0.05,
RightTrimFraction=0.05,
Hmin=0.1,
NumberOfSamples=2,
qthreshold=0.05)
> outflank_results$FSTbar
[1] 0.1917848
> outflank_results$FSTNoCorrbar
[1] 0.2038839
> outflank_results$dfInferred
[1] 2
> outflank_results$numberLowFstOutliers
[1] 0
> outflank_results$numberHighFstOutliers
[1] 0
OutFLANKResultsPlotter(outflank_results, withOutliers = TRUE,
NoCorr = TRUE, Hmin = 0.1, binwidth = 0.005, Zoom = FALSE,
RightZoomFraction = 0.05, titletext = NULL)
Now the model with left trim = 0.75. We can see that the inferred Fst distribution has changed.
outflank_results <- OutFLANK(FstDataFrame = outflank_dat,
LeftTrimFraction=0.75,
RightTrimFraction=0.05,
Hmin=0.1,
NumberOfSamples=2,
qthreshold=0.05)
> outflank_results$FSTbar
[1] 0.1917848
> outflank_results$FSTNoCorrbar
[1] 0.2038839
> outflank_results$dfInferred
[1] 3.466949
> outflank_results$numberLowFstOutliers
[1] 0
> outflank_results$numberHighFstOutliers
[1] 0
OutFLANKResultsPlotter(outflank_results, withOutliers = TRUE,
NoCorr = TRUE, Hmin = 0.1, binwidth = 0.005, Zoom = FALSE,
RightZoomFraction = 0.05, titletext = NULL)
And lastly, the model with right trim = 0.75. The inferred Fst distribution does not appear different from the default model, at least by eye.
outflank_results <- OutFLANK(FstDataFrame = outflank_dat,
LeftTrimFraction=0.05,
RightTrimFraction=0.75,
Hmin=0.1,
NumberOfSamples=2,
qthreshold=0.05)
> outflank_results$FSTbar
[1] 0.1917848
> outflank_results$FSTNoCorrbar
[1] 0.2038839
> outflank_results$dfInferred
[1] 2
> outflank_results$numberLowFstOutliers
[1] 0
> outflank_results$numberHighFstOutliers
[1] 0
OutFLANKResultsPlotter(outflank_results, withOutliers = TRUE,
NoCorr = TRUE, Hmin = 0.1, binwidth = 0.005, Zoom = FALSE,
RightZoomFraction = 0.05, titletext = NULL)
Although OutFLANK removes the RightTrimFraction for the initial estimate of the degrees of freedom, it then uses that initial estimate of the degrees of freedom to remove outliers and re-estimates the degrees of freedom. So, if the outliers that OutFLANK removes in this iterative process are the same as what you ask it to remove in the RightTrimFraction, you can get the same results for different RightTrimFraction. Here, I think part of the reason this is happening is because you only have 2 populations in the data, so the maximum likelihood estimator is driven by information at the left side of the distribution.
You can test this by running your code through the OutFLANK function line-by-line and seeing if there is any unexpected behavior: https://github.com/whitlock/OutFLANK/blob/master/R/OutFLANK.R
In any case, it looks like you get a good fit to the chi-square distribution with the default parameters.
From: dmacguigan @.> Sent: 03 March 2023 10:21 To: whitlock/OutFLANK @.> Cc: Lotterhos, Katie @.>; Comment @.> Subject: Re: [whitlock/OutFLANK] Different RightTrimFraction settings give same p values (Issue #32)
Sure thing. Histogram of corrected Fst
hist(outflank_dat$FST)
First, the model with the default trim values, 0.05 for right and left.
outflank_results <- OutFLANK(FstDataFrame = outflank_dat, LeftTrimFraction=0.05, RightTrimFraction=0.05, Hmin=0.1, NumberOfSamples=2, qthreshold=0.05)
outflank_results$FSTbar [1] 0.1917848 outflank_results$FSTNoCorrbar [1] 0.2038839 outflank_results$dfInferred [1] 2 outflank_results$numberLowFstOutliers [1] 0 outflank_results$numberHighFstOutliers [1] 0
OutFLANKResultsPlotter(outflank_results, withOutliers = TRUE, NoCorr = TRUE, Hmin = 0.1, binwidth = 0.005, Zoom = FALSE, RightZoomFraction = 0.05, titletext = NULL)
Now the model with left trim = 0.75. We can see that the inferred Fst distribution has changed.
outflank_results <- OutFLANK(FstDataFrame = outflank_dat, LeftTrimFraction=0.75, RightTrimFraction=0.05, Hmin=0.1, NumberOfSamples=2, qthreshold=0.05)
outflank_results$FSTbar [1] 0.1917848 outflank_results$FSTNoCorrbar [1] 0.2038839 outflank_results$dfInferred [1] 3.466949 outflank_results$numberLowFstOutliers [1] 0 outflank_results$numberHighFstOutliers [1] 0 OutFLANKResultsPlotter(outflank_results, withOutliers = TRUE, NoCorr = TRUE, Hmin = 0.1, binwidth = 0.005, Zoom = FALSE, RightZoomFraction = 0.05, titletext = NULL)
And lastly, the model with right trim = 0.75. The inferred Fst distribution does not appear different from the default model, at least by eye.
outflank_results <- OutFLANK(FstDataFrame = outflank_dat, LeftTrimFraction=0.05, RightTrimFraction=0.75, Hmin=0.1, NumberOfSamples=2, qthreshold=0.05)
outflank_results$FSTbar [1] 0.1917848 outflank_results$FSTNoCorrbar [1] 0.2038839 outflank_results$dfInferred [1] 2 outflank_results$numberLowFstOutliers [1] 0 outflank_results$numberHighFstOutliers [1] 0 OutFLANKResultsPlotter(outflank_results, withOutliers = TRUE, NoCorr = TRUE, Hmin = 0.1, binwidth = 0.005, Zoom = FALSE, RightZoomFraction = 0.05, titletext = NULL)
— Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fwhitlock%2FOutFLANK%2Fissues%2F32%23issuecomment-1453690627&data=05%7C01%7Ck.lotterhos%40northeastern.edu%7C350409eeda7e40c69e7f08db1bfaf580%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C638134536901004697%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=xrGqMA5%2BR8Yds0vihAItRMeDEit8Rl%2Bvf2VOsLIhizU%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABUNI3MXSCT645Y2C53RFM3W2ID7PANCNFSM6AAAAAAVMTY5OI&data=05%7C01%7Ck.lotterhos%40northeastern.edu%7C350409eeda7e40c69e7f08db1bfaf580%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C638134536901004697%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=VORuJd4VQRPnrAfsFpwObgafeGdY74PvQbohyiLxYf4%3D&reserved=0. You are receiving this because you commented.Message ID: @.***>
If I understand correctly, modifying the
RightTrimFraction
orLeftTrimFraction
parameters of theOutFLANK
function changes the Fst values used to estimate the Fst distribution. Since the Fst distribution is different, we should expect different p-values.When I change the
LeftTrimFraction
parameter, the right-tailed p-values are different.However, when I change the
RightTrimFraction
parameter, the resulting p-values are identical.I have attached my data (outflank_dat) if you would like to try replicating these results. outflank_dat.txt
Do you have any thoughts on why this is happening? I would like to adjust my
RightTrimFraction
parameter because I suspect there are many loci under selection in my dataset. But at the moment, adjusting that parameter has no effect.