Open josef-pkt opened 1 year ago
@josef-pkt is there any quick hack to get this working for Firth regression? I'm trying to run Firth regression but I need some l-bfgs optimizer because Newton-Raphson does not converge: https://github.com/jzluo/firthlogist/issues/17 (The same solution helped with sm.Logit already.)
I have not tried it out yet. It's a bit tricky, but my guess is that it should work as a hack that modifies the model instance.
e.g. the penalty as in https://github.com/jzluo/firthlogist/blob/master/firthlogist/firthlogist.py#L318
would need to call model.hessian(params)
to get the (observed) information matrix (GLM also has EIM)
I'm not sure about some details, e.g. taking derivatives
statsmodels.base._penalties.Penalty.deriv
Is it a required method or are there numerical derivative default methods in the model or mixin.
AFAICS, all the current penalty classes have analytical deriv
.
Aside:
exp
(in Logit, Poisson) easily overflows if exog x has large values. Trying with rescaled exog often helps in those cases.
( x > 80 or x > 100 already causes optimization problems in some examples that I had used for Poisson unit tests)
update some details on current implementation
hess = super(PenalizedMixin, self).hessian(params, **kwds)
in PenalizedMixinhessian_numdiff
method but it's not used by generic calls, I guess I had added it for checking or unit tests.to access unpenalized hessian:
If subclasses or model class want to access unpenalized loglike, score, hessian, then the super call needs to jump to the class inherited above the Mixin class.
Maybe we should add loglike_base
(or loglike_unpenalized
) methods (and similar for score_obs, hessian) to the PenalizedMixin class so that we don't need to jump super
classes.
I never looked at the numerical details or properties of the Firth penalty
Doesn't the penalty logdet break if information matrix is (near) singular? This might need some additional protection for (near) singular information matrix or hessian.
I never looked at the numerical details or properties of the Firth penalty
Doesn't the penalty logdet break if information matrix is (near) singular? This might need some additional protection for (near) singular information matrix or hessian.
I (kind of) ran into this issue that I got a negative determinant -> log not possible: https://github.com/jzluo/firthlogist/pull/18 My solution was to just set the penalty to 0 in such cases. No idea if that's a good idea...
(semi-random idea while browsing SAS docs for logistic and firth penalization)
related to Firth penalization #3561
reference Heinze, G., and Schemper, M. (2002). “A Solution to the Problem of Separation in Logistic Regression.” Statistics in Medicine 21:2409–2419.
penalized loglikelihood is log L(beta) = log L(beta) + 1/2 log | I(beta)| where I(beta) is the information matrix evaluated at beta
That means we need the penalization class to have access to the model so we can reuse the hessian (OIM or EIM) This might currently be tricky or impossible because we define the penalization instance before creating a penalized model instance.
One possibility may be to assign a
_model
attribute to the penalty instance after creating the model. This makes it a bit circulate The penalized model instance holds the penalty instance which holds the penalized model instance as attributes. I guess the circularity cannot be avoided unless we use extra args in each method of the penalty class.Note we need the hessian/EIM of the original model and not the penalized hessian of the penalized model.