statsmodels / statsmodels

Statsmodels: statistical modeling and econometrics in Python
http://www.statsmodels.org/devel/
BSD 3-Clause "New" or "Revised" License
9.98k stars 2.87k forks source link

ENH: many fixed effects in GLM, absorb in computation #7478

Open josef-pkt opened 3 years ago

josef-pkt commented 3 years ago

(I don't see a specific GLM issue for this)

2568 general issue for linear model with some discussion on GLM type models

motivation: #7469 stats.stackexchange question mentioning feglm and feglm.nb in alpaca package

Stammann, 2018 looks good, skimming some parts. uses alternating iteration with demeaning to update in IRLS

Czarnowske, Daniel, and Amrei Stammann. 2020. “Fixed Effects Binary Choice Models: Estimation and Inference with Long Panels.” ArXiv:1904.04217 [Econ], October. http://arxiv.org/abs/1904.04217.

Stammann, Amrei. 2018. “Fast and Feasible Estimation of Generalized Linear Models with High-Dimensional k-Way Fixed Effects.” ArXiv:1707.01815 [Stat], July. http://arxiv.org/abs/1707.01815.

Stammann, Amrei, Florian Heiß, and Daniel McFadden. 2016. “Estimating Fixed Effects Logit Models with Large Panel Data.” 145837. VfS Annual Conference 2016 (Augsburg): Demographic Change. VfS Annual Conference 2016 (Augsburg): Demographic Change. Verein für Socialpolitik / German Economic Association. https://ideas.repec.org/p/zbw/vfsc16/145837.html.

aeturrell commented 2 years ago

Just want to flag that there's a package with some code to do this, fastreg, but it uses Jax so not sure if it would be possible to use it for this issue or not.

josef-pkt commented 1 year ago

maybe similar for QuantReg https://stackoverflow.com/questions/74531455/statsmodels-quantreg-including-fixed-effects

RLM also uses IRLS which might be following a common pattern with GLM, QuantReg

aeturrell commented 1 year ago

Keeping a list of Stata to Python equivalents here: https://aeturrell.github.io/coding-for-economists/coming-from-stata.html The implicit advice I give currently is to switch to linear models' array API for absorbing regression but it would be amazing to replicate the full functionality of reghdfe and fixest in statsmodels' formula API—much simpler for users if there is just one tool with a familiar syntax.

josef-pkt commented 1 year ago

linearmodels package is too large, The priority for statsmodels or me is more for parts that are not covered by linear models, e.g. handling fixed effects for GLM and discrete models. However, if that is based on IRLS, then we need a WLS/OLS version first.

Also, most likely we need the "clustered" version (for arbitrary multiway categorical variables) as complement to a (balanced) panel data version.

I recently saw that some members of the University of Chicago are working on the two-sided labor market case for reghdfe settings. That application was my first reference for these large fixed effect cases. (millions of individuals so larger than what we will be easily be able to support in statsmodels with "generic" models.) (I might not have kept the github link to it)

Thanks for the link and your list coming-from-stata