feglm(family='poisson') seems to hang in cpp_demean() but femlm(family='poisson') does not?

vincLohm commented 5 months ago

Firstly I cannot thank you enough for writing such an amazingly fast high-dimensional FE pkg !

And sorry I'm not sure where to post this question. I feel like it is too specific to this package to be asked on stackexchange. I have a fairly large dataset (16.9M rows) indexed by i j t (with mostly zeros in the response variable). For certain specifications glm(poisson) seems to hang on demeaning, while the same specification with mlm returns pretty quickly. I'm confused as to what the difference between the 2 implementations is, and searched the documentation quite thoroughly but could not find anything on the difference.

Whenever I can get both functions to return I get identical estimates, including the FE.

Feel free to delete this post f it's not appropriate, but maybe you could point me to where I can find the answer in the documentation.

lrberge commented 4 months ago

Thanks for the words! :-)

I'm sorry for the inconvenience. Quick question: is this with the current version of the package (>=0.12.0)? Did you experience similar issues with old versions? I'm asking because I have changed the demeaning algorithm in 0.12.0 and hope I didn't introduce a regression.

That said: feglm and femlm, although they provide the same results are completely different algorithms! The algorithm of feglm is not detailed in the documentation but is a 'classic' GLM algorithm, as described in https://arxiv.org/abs/1903.01690https://arxiv.org/abs/1903.01690

The algorithm of femlm is a direct maximum likelihood estimation, as described in https://github.com/lrberge/fixest/blob/master/_DOCS/FENmlm_paper.pdf

The two algorithms are very different and don't have the same convergence properties.

To (possibly) make convergence faster, you may play with the argument fixef.algo of the feglm function, see its documentation here.

vincLohm commented 4 months ago

Sorry, I didn't see this reply ! Thanks for answering.

Ah I'm actually using quite an old version. 0.11.1. I see that you introduced the tweaked demeaning algorithm in 0.12, so that's not the problem. Ok thank you for the references. I guess there's no clear guidance then on when one should expect the two different algorithms to converge to the same solution?

lrberge / fixest

feglm(family='poisson') seems to hang in cpp_demean() but femlm(family='poisson') does not? #510