Open Hoeze opened 1 year ago
@jzluo I debugged a bit and found that the issue is the determinant getting negative which results in NaN's in the penalty. I tried to fix that in #18.
Another thing I found:
With FirthLogisticRegression(skip_ci=True, wald=True).fit(X, y)
the fitting works.
Otherwise, the regression runs in an infinite loop during CI/pvalue calculation if I do not set max_halfsteps > 0
.
Still, the NR steps do not converge even after 200 iterations.
Thanks so much! I will take a closer look tonight.
Sorry, things have been a bit hectic for me and it may be longer before I can sit down and dig into this.
Just as an initial sanity check, I tried your sample data with logistf
and brglm2
. Fitting fails in logistf
, and the results for brglm2
:
Call:
glm(formula = y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 +
X10 + X11 + X12 + X13 + X14 + X15 + X16 + X17 + X18 + X19 +
X20 + X21 + X22 + X23 + X24 + X25 + X26 + X27 + X28 + X29 +
X30 + X31 + X32 + X33 + X34, family = binomial(logit), data = data,
method = "brglmFit")
Deviance Residuals:
Min 1Q Median 3Q Max
-1.6170 -0.4648 -0.3425 -0.2405 3.3041
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.379e+00 1.144e-01 -38.281 < 2e-16 ***
X1 -3.372e-01 1.347e-01 -2.504 0.012296 *
X2 1.067e-02 1.486e-03 7.180 6.95e-13 ***
X3 6.447e-04 2.326e-03 0.277 0.781616
X4 8.806e-03 5.879e-03 1.498 0.134193
X5 1.978e-03 6.121e-03 0.323 0.746594
X6 -9.555e-03 5.908e-03 -1.617 0.105788
X7 9.907e-03 4.557e-03 2.174 0.029729 *
X8 -2.614e-03 1.972e-03 -1.325 0.185152
X9 5.465e-03 5.678e-03 0.962 0.335837
X10 4.911e-03 5.117e-03 0.960 0.337250
X11 2.094e-03 5.129e-03 0.408 0.683103
X12 -5.939e-03 2.064e-03 -2.877 0.004015 **
X13 -4.947e-03 4.903e-03 -1.009 0.312996
X14 6.875e-03 3.670e-03 1.873 0.061015 .
X15 -1.744e-02 4.920e-03 -3.544 0.000394 ***
X16 -6.035e-03 5.629e-03 -1.072 0.283699
X17 -1.795e-02 2.999e-03 -5.986 2.15e-09 ***
X18 -2.030e-03 5.223e-03 -0.389 0.697615
X19 -7.145e-03 3.056e-03 -2.338 0.019371 *
X20 -3.240e-03 4.584e-03 -0.707 0.479723
X21 2.543e-03 3.137e-03 0.811 0.417493
X22 -3.022e-03 3.157e-03 -0.957 0.338335
X23 2.704e-03 3.184e-03 0.849 0.395848
X24 7.830e-03 3.291e-03 2.379 0.017356 *
X25 -3.968e-03 3.175e-03 -1.250 0.211373
X26 -6.351e-03 3.194e-03 -1.989 0.046745 *
X27 5.167e-03 3.549e-03 1.456 0.145381
X28 1.475e-03 3.346e-03 0.441 0.659453
X29 5.652e-05 3.306e-03 0.017 0.986360
X30 2.458e-03 3.278e-03 0.750 0.453359
X31 -5.930e-03 3.310e-03 -1.792 0.073172 .
X32 -1.083e-03 3.294e-03 -0.329 0.742266
X33 4.086e-03 3.219e-03 1.269 0.204388
X34 1.131e+05 1.224e+03 92.404 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 98119 on 161633 degrees of freedom
Residual deviance: 88017 on 161599 degrees of freedom
AIC: 88087
Type of estimator: AS_mixed (mixed bias-reducing adjusted score equations)
Number of Fisher Scoring iterations: 3
Can I ask what kind of data this is?
Unfortunately, I cannot tell you what kind of data this is. But the fitting succeeds if you standard-normalize all features in X.
Without normalizing and using the default statsmodels logistic regression, you need L-BFGS-B to get a good fit. Newton-Raphson also fails to converge there.
So probably some L-BFGS-B implementation would help here a lot? This would also make firthlogist scale better to large numbers of variables.
Hi @jzluo, I found a sample dataset which fails fitting. Consider the example data I attached here: sample.zip
This fails after the second round of optimization because all coefficients get
NaN
.Detailed logs + stack trace: