stephenslab / mr.ash.alpha

An R package implementing Mr.ASH method
Other
9 stars 0 forks source link

Computing LFSR #6

Open nlapier2 opened 11 months ago

nlapier2 commented 11 months ago

I think it would be helpful to make it easy for the user to calculate the local false sign rate (LFSR). Right now the paper recommends computing the PIPs if one wants to do variable selection, but the LFSR should be preferable, in part because it is not sensitive to the prior variance grid.

Here is some code I wrote to calculate LFSR -- not 100% sure it's correct yet, so please check:

calc_lfsr = function(beta, full.post) {
  prob0 = full.post$phi[,1]
  gridlen = dim(full.post$phi)[2]
  neg = beta < 0
  prob_fs = pnorm(0, mean = full.post$m[,2:gridlen], sd = sqrt(full.post$s2[,2:gridlen]))
  prob_fs[neg, ] = 1 - prob_fs[neg, ]
  prob_opp = rowSums(prob_fs * full.post$phi[,2:gridlen])
  lfsr = as.numeric(prob0 + prob_opp)
  return(lfsr)
}

First I get the probability of 0 effect, i.e. 1-PIP. Then I compute the probability of false sign for each mixture component. Then I compute a weighted average of these probability with weights equal to the mixture component weights. Then I sum the probability of 0 effect and incorrect sign.

This code be called like so:

res.mr.ash = mr.ash(X, Y)
full.post = get.full.posterior(res.mr.ash)
calc_lfsr(res.mr.ash$beta, full.post)
pcarbo commented 11 months ago

@nlapier2 Run this example to see how I calculate lfsr in varbvs:

# For each variable (column of X), compute the least-squares estimate
# of b (bhat), and its variance (shat). If y is NULL, only the
# variance (shat) is computed.
simple_lr <- function (X, y, se) {
  X    <- scale(X,center = TRUE,scale = FALSE)
  xx   <- colSums(X^2)
  shat <- se/xx
  y    <- y - mean(y)
  bhat <- drop(y %*% X)/xx
  return(list(bhat = bhat, shat = shat))
}

library(ashr)
library(mr.ash.alpha)
set.seed(1)
n <- 800
p <- 100
dat <- mr.ash::simulate_regression_data(n,p,s = 20)
X <- scale(dat$X,center = TRUE,scale = TRUE)
y <- drop(scale(dat$y,center = TRUE,scale = TRUE))
X <- svd(X)$u

# Compute the lfsr's from the mr.ash outputs.
res <- mr.ash(X,y,sigma2 = 1,sa2 = c(0,0.5,1,10)^2,update.sigma2 = FALSE,
              intercept = TRUE,standardize = FALSE)
res.varbvs <- varbvs::varbvsmix(X,NULL,y,sa = c(0,0.5,1,10)^2,sigma = 1,
                                w = res$pi,update.sigma = FALSE,
                                update.sa = FALSE,update.w = FALSE)

# Compute the lfsr's using ashr.
lse <- simple_lr(X,y,se = 1)
g <- list(pi   = drop(res$pi),
          mean = rep(0,4),
          sd   = sqrt(res$data$sa2))
g$sd[1] <- 1e-6
class(g) <- "normalmix"
res.ash <- ash(lse$bhat,lse$shat,mixcompdist = "normal",g = g,fixg = TRUE)
lfsr2   <- res.ash$result$lfsr

# Compare the posterior means.
plot(res$beta,res.ash$result$PosteriorMean,pch = 20,
     xlab = "mr.ash",ylab = "ashr")
abline(a = 0,b = 1,lty = "dotted",col = "magenta")

# Compare the lfsr calculations.
plot(res.varbvs$lfsr,res.ash$result$lfsr,pch = 20,
     xlab = "mr.ash",ylab = "ashr")
abline(a = 0,b = 1,lty = "dotted",col = "magenta")

The dots in these scatterplots should lie along the diagonal.

pcarbo commented 11 months ago

Note that all the relevant code is in the ashr package; start with the calc_lfsr function which will call (indirectly) the appropriate functions in normalmix.

pcarbo commented 11 months ago

The problem may be due to faulty calculation of the full posterior; see Issue #3.

pcarbo commented 11 months ago

For this example, I get the same result with varbvs::varbvsmix, so at least in this example the get.full.posterior calculations are correct. This was my code:

res.varbvs <- varbvs::varbvsmix(X,NULL,y,sa = c(0,0.5,1,10)^2,sigma = 1,
                            w = res$pi,update.sigma = FALSE,
                update.sa = FALSE,update.w = FALSE)
pcarbo commented 11 months ago

Good news, though: I've already implemented the lfsr calculations in varbvs; I've updated the code above.

pcarbo commented 11 months ago

The one catch is that I think mr.ash and varbvsmix have slightly different priors when the residual variance sigma is not 1; to correct this, I think you have to scale the prior standard deviations by sigma in varbvsmix.

stephens999 commented 11 months ago

I can see immediately one problem:

sd = full.post$s2[,2:gridlen]

should be

sd = sqrt(full.post$s2[,2:gridlen])

because s2 is the variance (the 2 stands for squared)

Also, in general, we don't want to assume that the first component is a point mass on 0.

But as @pcarbo mentions maybe we can use existing code?

nlapier2 commented 11 months ago

@stephens999 thanks -- I will fix that.

@pcarbo Thanks! However, your code doesn't run for me -- "dat <- mr.ash::simulate_regression_data(n,p,s = 20)" will not run because the package is called mr.ash.alpha, and if I change it to "dat <- mr.ash.alpha::simulate_regression_data(n,p,s = 20)", it says simulate_regression_data is not an exported object from mr.ash.alpha.

stephens999 commented 11 months ago

I'm thinking we might want to have mr.ash.alpha (and mr.ash) return the fitted prior (g) and the summary statistics (bhat and standard errors) that can be used together to compute any posterior quantity of interest, using the ashr or ebnm interface

pcarbo commented 11 months ago

@nlapier2 You need to also install mr.ash to run my exaample.

pcarbo commented 11 months ago

@stephens999 I've created a new issue (Issue #8 ) to document your suggestion.

nlapier2 commented 11 months ago

@pcarbo Thanks -- didn't realize there was a separate mr.ash repo! What's the difference?

Running the code you posted and then computing the LFSR using the (fixed) function in the OP gives the same LFSR as res.varbvs$lfsr and res.ash$result$lfsr above:

full.post = get.full.posterior(res)
lfsr3 = calc_lfsr(res$beta, full.post)
cor(lfsr3, res.ash$result$lfsr)
pcarbo commented 11 months ago

mr.ash is intended to be a revamp of the original package, with improved interface. It is a work-in-progress.

Glad to see that your code gives the same result as mine. That is reassuring.