Question about validity of Margins Results post PPMLHDFE estimation

cckimm commented 1 year ago

Hello Sergio,

I have a question regarding the validity of margins command after ppmlhdfe estimation results when I have fixed effects. From my online searches, it seems like when you run margins command after you fit a Poisson estimation with fixed effects, the results can be invalid because margins command doesn't incorporate fixed effects. From one of the comments I found from the Stata Forum, margins after xtpoisson and xtlogit with fixed effects produces results that are meaningless because the marginal effects depend on the fixed effects and these are not estimated when you use these commands.

I followed the instructions to calculate marginal effect after ppmlhdfe from here: https://github.com/sergiocorreia/ppmlhdfe/blob/master/guides/undocumented.md#esttab-and-margins-options

I was wondering if the same problems apply to margins command after ppmlhdfe estimation. Also, if there is a problem, is there anyway I can incorporate fixed effects in the margins command after ppmlhdfe estimation?

Thank you very much in advance!

sergiocorreia commented 1 year ago

That's a great question, and I wish I had more time to dive into the margins command to extend the support of pmmlhdfe for margins options (there are a lot of margins options and ppmlhdfe only supports a few of them).

On your specific question, a simple way to verify if the margins command is indeed ignoring the FEs or not is to compare it against poisson with dummies:

sysuse auto, clear

ppmlhdfe price weight length, a(turn) d keepsingletons
margins, dydx(weight)

qui poisson price weight length i.turn, vce(robust)
margins, dydx(weight)

Here, you can see that both give the same results. However, I haven't tested much the at() option, which is the one that might be giving folks problems in the online discussions of xtlogit/xtpoisson. Thus, I encourage you to be careful when using at(), as shown below:

sysuse auto, clear

* Here margins works but gives you the results as if vce() was not robust
qui ppmlhdfe price c.weight#c.weight length, a(turn) d keepsingletons
margins, dydx(weight) at(weight=2000)
qui poisson price c.weight#c.weight length i.turn, vce(robust)
margins, dydx(weight) at(weight=2000)

* Here margin gives you different point estimates
qui ppmlhdfe price c.weight#c.weight, a(turn) d keepsingletons
margins, dydx(weight) at(weight=2000)
qui poisson price c.weight#c.weight i.turn, vce(robust)
margins, dydx(weight) at(weight=2000)

I would need to understand more the details of margins and its underlying assumptions in order to make it work better for ppmlhdfe, which I'm not able to do (at least in the near future), but any suggestions or findings would be more than welcome!

cckimm commented 1 year ago

Thank you, Sergio for your response.

I have one follow up question.

Would the results from ppmlhdfe regression model always be the same as the poisson model with dummy variables for the fixed effects? Even if the dependent count variable may be zero inflated?

I would assume that if I directly add multiple dummy variables for fixed effects in the poisson command, the estimation process will take very long. However, I am curious at the end of the estimation I will always get the same result as the PPMLHDFE regression result.

Thank you.

ozak commented 1 year ago

@sergiocorreia I have a similar question. I was testing whether the pplmhdfe and possion commands generated the same results and while in the example above it does generate identical ones, with my data they are quite different. Here's the output of your example (to show it is not an installation issue)

and here it is for my data using basically a similar command

where I am running the commands

ppmlhdfe N_groups_0 sharepopregsec  if sample==1, absorb(idGID_0) cluster(idGID_0)  d keepsingletons

margins, dydx(sharepopregsec)

poisson N_groups_0 sharepopregsec i.idGID_0 if sample==1, cluster(idGID_0) 

margins, dydx(sharepopregsec)

Notice the different samples. Yet I get the same result if I drop keepsingletons.

Moreover, if I do not have FEs I do get the exact same result. Also, the estimated coefficients (not MEs) are actually identical, so it seems this is a MEs + FEs issue . To get the same MEs with FEs, I had to constrain the sample so that using poisson it is the same as using ppmlhdfe, i.e.,

ppmlhdfe N_groups_0 sharepopregsec  if sample==1, absorb(idGID_0) cluster(idGID_0)  d keepsingletons

margins, dydx(sharepopregsec)

poisson N_groups_0 sharepopregsec i.idGID_0 if e(sample)==1, cluster(idGID_0) 

margins, dydx(sharepopregsec)

in which case the MEs are identical

I thought the option keepsingletons would ensure the sample remains the same as the one using poisson, but it seems that is not the case (which I think is the case in reghdfe vs reg). Is there a way to ensure ppmlhdfe uses the full sample? What is being dropped that is not a singleton? Perhaps more importantly, which one is the correct one? Any ideas?

Thanks!

sergiocorreia commented 1 year ago

I can't tell for sure without looking at the data, but one possibility is that the dropped observations correspond to "separated observations", particularly those separated by the fixed effects.

This separation primer should clarify the issue a bit.

ozak commented 1 year ago

Thanks! This is super useful and I imagined it was something like that. I just thought the keepsingletons option was including them somehow.

ozak commented 1 year ago

Is the separation issue not affecting other estimators? I remember from reghdfe that you showed that the estimate is the same (some difference in SEs). Here estimators are the same but MEs and SEs are different. Does this suggest that one should perhaps constrain the estimation sample in general to non-separated observations?

ozak commented 1 year ago

@sergiocorreia following up on my previous question. I imagine the separation issue is also important, e.g., in negative binomial. Should we expect to have a similar convergence to the wrong estimate issue in nbreg?

I see that in my data reghdfe (full sample) and ppmlhdfe (dropping separated obs) generate similar MEs, but poisson and nbreg have very different ones if I use the full sample but very similar if dropping separated obs. There are only very small differences in the estimated coefficients and SEs, so it would seem the issue of differences in MEs is due to the sample means used in their computation. It is not clear to me which is the correct comparison. Since in my data there are separated observations, the estimation is only correct with ppmlhdfe, yet the estimated coefficient with poisson is basically identical. Should one still use only the constrained sample to estimate poisson and nbreg (and get ME's) since this is where the actual variation for estimation is coming from?

sergiocorreia / ppmlhdfe

Question about validity of Margins Results post PPMLHDFE estimation #18