Open nlapier2 opened 1 year ago
@nlapier2 Run this example to see how I calculate lfsr in varbvs
:
# For each variable (column of X), compute the least-squares estimate
# of b (bhat), and its variance (shat). If y is NULL, only the
# variance (shat) is computed.
simple_lr <- function (X, y, se) {
X <- scale(X,center = TRUE,scale = FALSE)
xx <- colSums(X^2)
shat <- se/xx
y <- y - mean(y)
bhat <- drop(y %*% X)/xx
return(list(bhat = bhat, shat = shat))
}
library(ashr)
library(mr.ash.alpha)
set.seed(1)
n <- 800
p <- 100
dat <- mr.ash::simulate_regression_data(n,p,s = 20)
X <- scale(dat$X,center = TRUE,scale = TRUE)
y <- drop(scale(dat$y,center = TRUE,scale = TRUE))
X <- svd(X)$u
# Compute the lfsr's from the mr.ash outputs.
res <- mr.ash(X,y,sigma2 = 1,sa2 = c(0,0.5,1,10)^2,update.sigma2 = FALSE,
intercept = TRUE,standardize = FALSE)
res.varbvs <- varbvs::varbvsmix(X,NULL,y,sa = c(0,0.5,1,10)^2,sigma = 1,
w = res$pi,update.sigma = FALSE,
update.sa = FALSE,update.w = FALSE)
# Compute the lfsr's using ashr.
lse <- simple_lr(X,y,se = 1)
g <- list(pi = drop(res$pi),
mean = rep(0,4),
sd = sqrt(res$data$sa2))
g$sd[1] <- 1e-6
class(g) <- "normalmix"
res.ash <- ash(lse$bhat,lse$shat,mixcompdist = "normal",g = g,fixg = TRUE)
lfsr2 <- res.ash$result$lfsr
# Compare the posterior means.
plot(res$beta,res.ash$result$PosteriorMean,pch = 20,
xlab = "mr.ash",ylab = "ashr")
abline(a = 0,b = 1,lty = "dotted",col = "magenta")
# Compare the lfsr calculations.
plot(res.varbvs$lfsr,res.ash$result$lfsr,pch = 20,
xlab = "mr.ash",ylab = "ashr")
abline(a = 0,b = 1,lty = "dotted",col = "magenta")
The dots in these scatterplots should lie along the diagonal.
Note that all the relevant code is in the ashr
package; start with the calc_lfsr
function which will call (indirectly) the appropriate functions in normalmix
.
The problem may be due to faulty calculation of the full posterior; see Issue #3.
For this example, I get the same result with varbvs::varbvsmix
, so at least in this example the get.full.posterior
calculations are correct. This was my code:
res.varbvs <- varbvs::varbvsmix(X,NULL,y,sa = c(0,0.5,1,10)^2,sigma = 1,
w = res$pi,update.sigma = FALSE,
update.sa = FALSE,update.w = FALSE)
Good news, though: I've already implemented the lfsr calculations in varbvs
; I've updated the code above.
The one catch is that I think mr.ash
and varbvsmix
have slightly different priors when the residual variance sigma
is not 1; to correct this, I think you have to scale the prior standard deviations by sigma
in varbvsmix
.
I can see immediately one problem:
sd = full.post$s2[,2:gridlen]
should be
sd = sqrt(full.post$s2[,2:gridlen])
because s2 is the variance (the 2 stands for squared)
Also, in general, we don't want to assume that the first component is a point mass on 0.
But as @pcarbo mentions maybe we can use existing code?
@stephens999 thanks -- I will fix that.
@pcarbo Thanks! However, your code doesn't run for me -- "dat <- mr.ash::simulate_regression_data(n,p,s = 20)" will not run because the package is called mr.ash.alpha, and if I change it to "dat <- mr.ash.alpha::simulate_regression_data(n,p,s = 20)", it says simulate_regression_data is not an exported object from mr.ash.alpha.
I'm thinking we might want to have mr.ash.alpha (and mr.ash) return the fitted prior (g) and the summary statistics (bhat and standard errors) that can be used together to compute any posterior quantity of interest, using the ashr or ebnm interface
@stephens999 I've created a new issue (Issue #8 ) to document your suggestion.
@pcarbo Thanks -- didn't realize there was a separate mr.ash repo! What's the difference?
Running the code you posted and then computing the LFSR using the (fixed) function in the OP gives the same LFSR as res.varbvs$lfsr
and res.ash$result$lfsr
above:
full.post = get.full.posterior(res)
lfsr3 = calc_lfsr(res$beta, full.post)
cor(lfsr3, res.ash$result$lfsr)
mr.ash
is intended to be a revamp of the original package, with improved interface. It is a work-in-progress.
Glad to see that your code gives the same result as mine. That is reassuring.
I think it would be helpful to make it easy for the user to calculate the local false sign rate (LFSR). Right now the paper recommends computing the PIPs if one wants to do variable selection, but the LFSR should be preferable, in part because it is not sensitive to the prior variance grid.
Here is some code I wrote to calculate LFSR -- not 100% sure it's correct yet, so please check:
First I get the probability of 0 effect, i.e. 1-PIP. Then I compute the probability of false sign for each mixture component. Then I compute a weighted average of these probability with weights equal to the mixture component weights. Then I sum the probability of 0 effect and incorrect sign.
This code be called like so: