Closed osorensen closed 2 years ago
Digging a bit further, there seems to be an actual error in corrected_lasso()
with family = "poisson"
.
The iterative scheme in the Statistica Sinica paper is as follows:
This is implemented in the following lines:
For family = "binomial"
, the formulas are
and this seems to be correctly implemented.
However, for family = "poisson"
, the formulas are
and this does not seem correctly implemented, because I simply use the iterative scheme for family = "binomial"
while inserting the mean function for Poisson regression.
Commit 27befee2cb350dfa402abb74668bba1ef711eec7 seems to fix the bug. Here is a test script. Note however that it still seems sensitive to numerical overflow. The log-sum-exp trick might do the job here.
library(hdme)
set.seed(123)
n <- 100
p <- 6
q <- 2
coefs <- vapply(1:100, function(i){
X <- matrix(rnorm(n * p), nrow = n)
sigmaUU <- diag(x = 0.2, nrow = p, ncol = p)
W <- X + rnorm(n, sd = sqrt(diag(sigmaUU)))
y <- rpois(n, exp(X %*% c(rep(.3, q), rep(0, p-q))))
fit <- corrected_lasso(W, y, sigmaUU, family = "poisson")
fit$betaCorr[, ncol(fit$betaCorr)]
}, FUN.VALUE = numeric(p))
apply(coefs, 1, function(x) sum(x != 0))
#> [1] 50 37 2 3 6 2
apply(coefs, 1, function(x) mean(x))
#> [1] 0.283163346 0.248876892 -0.010746986 0.012038814 -0.020030758
#> [6] -0.007784253
Created on 2022-07-03 by the reprex package (v2.0.1)
Why does this happen (based on user feedback)?
Created on 2022-07-01 by the reprex package (v2.0.1)