stephens999 / ashr

An R package for adaptive shrinkage
GNU General Public License v3.0
79 stars 35 forks source link

Underflow problem for t-likelihood #139

Closed william-denault closed 1 year ago

william-denault commented 1 year ago

Hello,

I am trying to use the approach proposed by Lu and Stephens to solve the Empirical Bayes for t-distributed estimates.

While the implemented method seems to work correctly for small sample size (n=4 to 10). It seems that the implemented method is limited for larger sample sizes (n>15) due to some underflow problems (from the stat function)

Please find an example of this problem below. Do you have any easy fixes? (better precision quantile estimate for the Gaussian distribution for exemple).

`

effective.effect=function(betahat,se,df){ p = stats::pt(betahat/se,df ) stats::qnorm( p ,sd=se ) }

coef <- c() coef_cor <- c() n <-18 for ( i in 1:100){ x <- rnorm(n) y <-10*x+rnorm(n) fit <-lm(y~0+x)

coef <- c(coef, summary(fit)$coefficients[1,1]) coef_cor <- c(coef_cor, effective.effect( betahat =summary(fit)$coefficients[1,1], se= sqrt((summary(fit)$coefficients[1,2]^2) ), df=n))

} coef_cor

plot( coef , coef_cor) abline(a=0,b=1) `

pcarbo commented 1 year ago

@william-denault Do you need a set.seed in your example?

aksarkar commented 1 year ago

I think the example code does not compute the p-value correctly:

> summary(fit)$coefficients[1,1] / summary(fit)$coefficients[1,2]
[1] 28.03315
> pt(summary(fit)$coefficients[1,1] / summary(fit)$coefficients[1,2], n - 1)
[1] 1
> pt(summary(fit)$coefficients[1,1] / summary(fit)$coefficients[1,2], n - 1, lower.tail=FALSE)
[1] 5.660379e-16

I am also unsure the example correctly implements the adhoc procedure outlined in Lu and Stephens 2019, since I think it should return a modified standard error, not a modified effect size.

william-denault commented 1 year ago

Hi @pcarbo and @aksarkar

Thank you for your feedback. You don't need some set.seed for the example it should contain some cases that fails .

The effective.effect function I put in the example is actually what is implemented in the ashr package to handle t likelihood https://github.com/stephens999/ashr/blob/4cad5c48d1d2929243d00b596e94a9dd8baaa4de/R/ash.R#L731 . However, I agree with you that what Lu and Stephens outline returns modified standard errors.

You can see a similar problem using their adhoc procedure below (using the code from the Lu and Stephens draft).

`library(ashr)

set.seed(1) pval2se = function(bhat,p){z = qnorm(1-p/2); s = abs(bhat/z); return(s)}

sdval <-c() sdcor <- c()

coef <- c() coef_cor <- c() n <-18 for ( i in 1:100){ x <- rnorm(n) y <-10*x+rnorm(n) fit <-lm(y~0+x)

sdval <- c(sdval,summary(fit)$coefficients[1,2]) sdcor <- c(sdcor, pval2se(summary(fit)$coefficients[1,1], summary(fit)$coefficients[1,4]) ) }

plot( sdval , sdcor) abline(a=0,b=1)

pval2se(summary(fit)$coefficients[1,1], summary(fit)$coefficients[1,4]) `

aksarkar commented 1 year ago

I think the numerical issue can be avoided by using qnorm(..., lower.tail=FALSE)

set.seed(1)
n <- 18
x <- rnorm(n)
y <- 10 * x + rnorm(n)
fit <- lm(y ~ 0 + x)
qnorm(1 - summary(fit)$coefficients[1,4] / 2)
## [1] Inf
qnorm(summary(fit)$coefficients[1,4] / 2, lower.tail=FALSE)
## [1] 8.926141
william-denault commented 1 year ago

Thank you, Abishek,

This works nicely.