statsmodels / statsmodels

Statsmodels: statistical modeling and econometrics in Python
http://www.statsmodels.org/devel/
BSD 3-Clause "New" or "Revised" License
10.07k stars 2.88k forks source link

ENH: penalized mixin: custom penalization with model properties #8594

Open josef-pkt opened 1 year ago

josef-pkt commented 1 year ago

(semi-random idea while browsing SAS docs for logistic and firth penalization)

related to Firth penalization #3561

reference Heinze, G., and Schemper, M. (2002). “A Solution to the Problem of Separation in Logistic Regression.” Statistics in Medicine 21:2409–2419.

penalized loglikelihood is log L(beta) = log L(beta) + 1/2 log | I(beta)| where I(beta) is the information matrix evaluated at beta

That means we need the penalization class to have access to the model so we can reuse the hessian (OIM or EIM) This might currently be tricky or impossible because we define the penalization instance before creating a penalized model instance.

One possibility may be to assign a _model attribute to the penalty instance after creating the model. This makes it a bit circulate The penalized model instance holds the penalty instance which holds the penalized model instance as attributes. I guess the circularity cannot be avoided unless we use extra args in each method of the penalty class.

Note we need the hessian/EIM of the original model and not the penalized hessian of the penalized model.

Hoeze commented 1 year ago

@josef-pkt is there any quick hack to get this working for Firth regression? I'm trying to run Firth regression but I need some l-bfgs optimizer because Newton-Raphson does not converge: https://github.com/jzluo/firthlogist/issues/17 (The same solution helped with sm.Logit already.)

josef-pkt commented 1 year ago

I have not tried it out yet. It's a bit tricky, but my guess is that it should work as a hack that modifies the model instance.

e.g. the penalty as in https://github.com/jzluo/firthlogist/blob/master/firthlogist/firthlogist.py#L318 would need to call model.hessian(params) to get the (observed) information matrix (GLM also has EIM)

I'm not sure about some details, e.g. taking derivatives statsmodels.base._penalties.Penalty.deriv Is it a required method or are there numerical derivative default methods in the model or mixin. AFAICS, all the current penalty classes have analytical deriv.

Aside: exp (in Logit, Poisson) easily overflows if exog x has large values. Trying with rescaled exog often helps in those cases. ( x > 80 or x > 100 already causes optimization problems in some examples that I had used for Poisson unit tests)

update some details on current implementation

to access unpenalized hessian: If subclasses or model class want to access unpenalized loglike, score, hessian, then the super call needs to jump to the class inherited above the Mixin class. Maybe we should add loglike_base (or loglike_unpenalized) methods (and similar for score_obs, hessian) to the PenalizedMixin class so that we don't need to jump super classes.

josef-pkt commented 1 year ago

I never looked at the numerical details or properties of the Firth penalty

Doesn't the penalty logdet break if information matrix is (near) singular? This might need some additional protection for (near) singular information matrix or hessian.

Hoeze commented 1 year ago

I never looked at the numerical details or properties of the Firth penalty

Doesn't the penalty logdet break if information matrix is (near) singular? This might need some additional protection for (near) singular information matrix or hessian.

I (kind of) ran into this issue that I got a negative determinant -> log not possible: https://github.com/jzluo/firthlogist/pull/18 My solution was to just set the penalty to 0 in such cases. No idea if that's a good idea...