osorensen / hdme

R-package containing penalized regression methods for High-Dimensional Measurement Error problems (errors-in-variables)
GNU General Public License v3.0
8 stars 3 forks source link

Poisson regression failing #34

Closed osorensen closed 2 years ago

osorensen commented 2 years ago

Why does this happen (based on user feedback)?

library(hdme)
set.seed(123)
n <- 1000
p <- 10
X <- matrix(rnorm(n * p), nrow = n)
sigmaUU <- diag(x = 0.2, nrow = p, ncol = p)
W <- X + rnorm(n, sd = diag(sigmaUU))
pois<-function(x) exp(x)
#response
y2<-rpois(n,exp(X%*%c(rep(5, 5), rep(0, p-5))))
fit2<-corrected_lasso(W,y2,sigmaUU,family = "poisson")
#> Warning in max(which(u > (sv - radius)/(1:length(u)))): no non-missing arguments
#> to max; returning -Inf
#> Error in while (s <= maxIR & diff > tol) {: missing value where TRUE/FALSE needed

Created on 2022-07-01 by the reprex package (v2.0.1)

osorensen commented 2 years ago

Digging a bit further, there seems to be an actual error in corrected_lasso() with family = "poisson".

The iterative scheme in the Statistica Sinica paper is as follows:

image

This is implemented in the following lines:

https://github.com/osorensen/hdme/blob/f396c99db8b7d801a37a33a045ff9850f722c0b7/R/corrected_lasso_glm.R#L34-L44

For family = "binomial", the formulas are

image

and this seems to be correctly implemented.

However, for family = "poisson", the formulas are

image

and this does not seem correctly implemented, because I simply use the iterative scheme for family = "binomial" while inserting the mean function for Poisson regression.

osorensen commented 2 years ago

Commit 27befee2cb350dfa402abb74668bba1ef711eec7 seems to fix the bug. Here is a test script. Note however that it still seems sensitive to numerical overflow. The log-sum-exp trick might do the job here.

library(hdme)
set.seed(123)
n <- 100
p <- 6
q <- 2

coefs <- vapply(1:100, function(i){
  X <- matrix(rnorm(n * p), nrow = n)
  sigmaUU <- diag(x = 0.2, nrow = p, ncol = p)
  W <- X + rnorm(n, sd = sqrt(diag(sigmaUU)))
  y <- rpois(n, exp(X %*% c(rep(.3, q), rep(0, p-q))))
  fit <- corrected_lasso(W, y, sigmaUU, family = "poisson")
  fit$betaCorr[, ncol(fit$betaCorr)]
}, FUN.VALUE = numeric(p))

apply(coefs, 1, function(x) sum(x != 0))
#> [1] 50 37  2  3  6  2
apply(coefs, 1, function(x) mean(x))
#> [1]  0.283163346  0.248876892 -0.010746986  0.012038814 -0.020030758
#> [6] -0.007784253

Created on 2022-07-03 by the reprex package (v2.0.1)