lrberge / fixest

Fixed-effects estimations
https://lrberge.github.io/fixest/
361 stars 59 forks source link

cannot reproduce xtlogit results #489

Closed zwright2482 closed 2 months ago

zwright2482 commented 2 months ago

First off, thank you for creating such a useful package. Fixest has been tremendous!

I am not obtaining consistent results when running a fixed effects logistic regression in feglm and xtlogit in Stata. Is there something I am missing as to why these outputs would be different?

Below is the code and output for each using the uploaded datafile:

1) FEGLM (R) import data into R and name test.data feglm(y~x1 + x2 + x3+x4| ID , data=test.data, panel.id = ~ID+time, fixef.rm ="both", cluster= ~ID + time, family = "logit")

image

2) XTLOGIT (Stata) import data into Stata xtset id time xtlogit y x1 x2 x3 x4 id, fe

image

test.data.csv

grantmcdermott commented 2 months ago

Given the large number of FE groups relative to obs in your simulated data, my guess is that it's likely a convergence issue. (Compared to R, Stata is a good deal more susceptible to convergence failures in small n logit cases in my experience.) What happens if you run a straight logit in Stata with a id as an indicator (factor) variable?

logit y x1 x2 x3 x4 i.id
zwright2482 commented 2 months ago

Good thought, and thank you for your response. See image below for the output of running a straight logit in Stata using the code you suggested:

image

These results are identical to what I got from my feglm code above. These results are also equivalent to that I get when running the following in R as a straight logit: glm(y~x1+ x2+ x3 + x4 +as.factor(ID), family = "binomial", data = test.data)

I did some more digging, and I was able to reproduce the xtlogit results using clogit (from the survival package) in R with this code;

clogit(y~x1+ x2+ x3+x4 +strata(ID), data = test.data)

This makes me think that xtlogit is running a conditional logistic regression while feglm is equivalent to a straight logit. Is that how you see it as well?

The data are in a panel structure, so I would assume that a straight logit wouldn't sufficiently capture with within effects over time, hence why I included panel.id = ~ID+time in the feglm code.

Any thoughts/suggestions on this?

lrberge commented 2 months ago

Hi: you've right, the models are simply different: fixest does not implement the conditional logit. The argument panel.id is only used for introducing leads/lags of variables or requesting HAC VCOV. If you want to use the conditional logit, you will need to use alternative packages like the one you've found (survival).