py-econometrics / pyfixest

Fast High-Dimensional Fixed Effects Regression in Python following fixest-syntax
https://py-econometrics.github.io/pyfixest/
MIT License
169 stars 34 forks source link

Logistic Regression #668

Open s3alfisc opened 2 weeks ago

s3alfisc commented 2 weeks ago

Would be cool to support logistic regression, so we could implement the unconditional logit estimator described in Stamann et al:

image.

The function could inherit or just borrow much from the Fepois class as it also implements a iterated weighted least squares estimator with demeaning steps in every iteration.

Alternatively, we could also set up a GLM base class, from which both Fepois and Felogit could inherit - but rather something for a second refactoring PR and maybe overkill.

Maybe of interest for @leostimpfle or @Jayhyung ? =)

leostimpfle commented 2 weeks ago

Definitely of interest to me and I also like the idea of a separate GLM class. However, I'll first focus on updating the predict method and am unlikely to have time before that.

s3alfisc commented 2 weeks ago

Cool! I might try myself at a very basic implementation over the weekend =)

apoorvalal commented 4 days ago

Conditional v Unconditional choice should be informed by whether you want to support the computation of partial effects, right? CL has a really clean, simple solution that will likely scale a lot faster than the unconditional logit from the OP [they have an R package too btw].

s3alfisc commented 3 days ago

I was thinking about implementing the alpaca style UCL, mostly because Stammann, Heiss & McFadden seem to argue that it is computationally more feasible than the CL estimator (despite being only T-consistent) - see the quote of the summary of their paper below.

Additionally, this is what fixest seems to implement (from @lrberge's paper on glms):

image

The other main advantage (if you want to call it one) is that it's an IWLS algorithm similar to pplmhdfe, so no need to maximize a likelihood directly + a lot of the Fepois code logic could be reused.

Generally, I think the main advantage of having a logit class is that we could use it to compute propensity scores without having to add outside dependencies (which in turn would then allow to implement doubly robust estimators).

I'm actually not sure how many users would be interested in using it directly to compute partial effects, though I think at least in teaching, some might want to? @aeturrell or @gbekes might have opinions on this? =)

![image](https://github.com/user-attachments/assets/3127a8b8-2202-4f19-99c6-d879323b77cb

gbekes commented 3 days ago

Tagging @vincentarelbundock for marginal effects