fepois() does not fully replicate ppmlhdfe in Stata

Dear Laurent (@lrberge) et al, I am writing to kindly request your assistance because I am unable to have fixest::fepois() fully replicate ppmlhdfe in Stata. Per this thread, I understand that fepois() is supposed to produce the exact same output as ppmlhdfe in Stata. Initially, I thought my problem was a marginaleffects:: one given the vastly different marginal effects confidence intervals between fixest::fepois() and ppmlhdfe, so I brought it to @vincentarelbundock 's attention in this issue. However, Vincent pointed out that the regular, non-marginalized standard errors were not exactly the same, which he suggests could produce different variance-covariance matrices and, in turn, vastly different marginal effects confidence intervals. Accordingly, I thought I would post the issue here with the same reproducible code for only the regular estimation. Do you have any thoughts on the matter? In any case, thank you so much for your incredible work with this package -- it has really changed R, and I teach your work to all of my students.

Here is the fepois() code:

# libraries
library(fixest)
library(AER)

# load fatalities data from AER package
data(Fatalities)

# run regular model with state & year fixed effects
regular_model <- fixest::fepois(fatal ~ beertax | state + year, data=Fatalities, cluster=~state)
summary(regular_model)

Poisson estimation, Dep. Var.: fatal
Observations: 336
Fixed-effects: state: 48,  year: 7
Standard-errors: Clustered (state) 
         Estimate Std. Error  z value Pr(>|z|)    
beertax -0.347274   0.174639 -1.98852 0.046754 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Log-Likelihood: -2,095.5   Adj. Pseudo R2: 0.981999
           BIC:  4,511.0     Squared Cor.: 0.992513

Here is the Stata code:

. * import the data
. import delimited using  "fatalities.csv"
(34 vars, 336 obs)

* run the model
ppmlhdfe fatal beertax, absorb(state year) vce(cl state) d
Iteration 1:   deviance = 7.5532e+03  eps = .         iters = 3    tol = 1.0e-04  min(eta) =  -1.43  P   
Iteration 2:   deviance = 1.6438e+03  eps = 3.59e+00  iters = 2    tol = 1.0e-04  min(eta) =  -1.97      
Iteration 3:   deviance = 1.4156e+03  eps = 1.61e-01  iters = 2    tol = 1.0e-04  min(eta) =  -2.18      
Iteration 4:   deviance = 1.4138e+03  eps = 1.25e-03  iters = 2    tol = 1.0e-04  min(eta) =  -2.21      
Iteration 5:   deviance = 1.4138e+03  eps = 1.64e-07  iters = 2    tol = 1.0e-04  min(eta) =  -2.21      
Iteration 6:   deviance = 1.4138e+03  eps = 3.08e-15  iters = 1    tol = 1.0e-05  min(eta) =  -2.21   S  
Iteration 7:   deviance = 1.4138e+03  eps = 1.47e-16  iters = 1    tol = 1.0e-08  min(eta) =  -2.21   S O
------------------------------------------------------------------------------------------------------------
(legend: p: exact partial-out   s: exact solver   h: step-halving   o: epsilon below tolerance)
Converged in 7 iterations and 13 HDFE sub-iterations (tol = 1.0e-08)

HDFE PPML regression                              No. of obs      =        336
Absorbing 2 HDFE groups                           Residual df     =         47
Statistics robust to heteroskedasticity           Wald chi2(1)    =       4.04
Deviance             =  1413.784422               Prob > chi2     =     0.0445
Log pseudolikelihood = -2095.542592               Pseudo R2       =     0.9825

Number of clusters (state)  =         48
                                 (Std. Err. adjusted for 48 clusters in state)
------------------------------------------------------------------------------
             |               Robust
       fatal |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     beertax |  -.3472736   .1728049    -2.01   0.044    -.6859649   -.0085822
       _cons |   7.396313   .0928461    79.66   0.000     7.214338    7.578288
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
       state |        48          48           0    *|
        year |         7           1           6     |
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation

And here is sessionInfo():

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8 LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C LC_TIME=English_United States.utf8

time zone: America/Chicago
tzcode source: internal

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] rio_1.0.1 marginaleffects_0.16.0 AER_1.2-12 survival_3.5-7 sandwich_3.1-0
[6] lmtest_0.9-40 zoo_1.8-12 car_3.1-2 carData_3.0-5 broom_1.0.5
[11] lubridate_1.9.3 forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4 purrr_1.0.2
[16] readr_2.1.4 tidyr_1.3.0 tibble_3.2.1 ggplot2_3.5.0 tidyverse_2.0.0
[21] fixest_0.12.0

loaded via a namespace (and not attached):
[1] gtable_0.3.5 insight_0.19.11 lattice_0.21-9 tzdb_0.4.0 numDeriv_2016.8-1.1
[6] vctrs_0.6.5 tools_4.3.2 generics_0.1.3 fansi_1.0.5 pkgconfig_2.0.3
[11] R.oo_1.25.0 Matrix_1.6-5 data.table_1.14.8 checkmate_2.3.1 stringmagic_1.1.2
[16] lifecycle_1.0.4 compiler_4.3.2 munsell_0.5.1 Formula_1.2-5 pillar_1.9.0
[21] R.utils_2.12.3 abind_1.4-5 nlme_3.1-163 tidyselect_1.2.1 mvtnorm_1.2-4
[26] stringi_1.8.2 splines_4.3.2 grid_4.3.2 colorspace_2.1-0 cli_3.6.1
[31] magrittr_2.0.3 utf8_1.2.4 withr_3.0.0 dreamerr_1.4.0 scales_1.3.0
[36] backports_1.4.1 timechange_0.2.0 estimability_1.4.1 emmeans_1.10.0 R.methodsS3_1.8.2
[41] hms_1.1.3 coda_0.19-4.1 rlang_1.1.2 Rcpp_1.0.11 xtable_1.8-4
[46] glue_1.6.2 rstudioapi_0.15.0 R6_2.5.1

Hi @mikedenly ,

I think there are a few reasons for differences in standard errors. First, fixest and reghdfe might not implement the exact same algorithms for solving the iterated least squares problem. E.g. the demeaning algos for fixed effects differ and pplmhdfe additionally implements a set of accelerations that fixest::fepois might or might not include. Most importantly, both algorithms will rely on different convergence thresholds. Last, I think that you might observe differences in standard errors because the matrix inversion solve(crossprod(X)) is used twice in the computation of SEs (vs just once for the OLS coefs) and as a result, standard errors should have a larger "numerical error" than point estimates. I have experienced this a lot with my unit tests for pyfixest, where matching fixest::fepois() coefs at small tolerance was relatively easy, but for clustered SEs, not so much! 😄

You might be able to test this theory by decreasing the tolerance of the convergence algorithms (both for fixed effect demeaning and the IWLS iterations), in which case we should see a decrease in the difference between standard errors.

Beyond, there are other reasons why standard errors might differ - i.e. the packages might use different small sample / degree of freedom corrections for clustered standard errors. There area a lot of details on what fixest does here: link.

Having said that, if I had to bet, I'd say it's what you observe is a combination of slightly different algorithms + numerical error.

Hi @s3alfisc, thank you so much for very your thoughtful answer! The link that you provided on standard errors also confirms that fixest calculates standard errors similar to reghdfe in Stata, and according to here that should not be an issue with fixest::fepois and ppmlhdfe in Stata as well. I am thus a bit perplexed about how to proceed, especially because I would like to calculate marginal effects after the regular estimation, and my closed issue on @vincentarelbundock 's incredible marginaleffects package shows that these differences you point out can lead to such vast differences between Stata and R output -- or, in your case, Python and pyfixest as well. To avoid potential mistakes, I sent my paper to the journal with interpretations based on exponentiated coefficients because they match across software programs -- even though, clearly, having a marginal effect is, all else equal , much more interpretable. However, it seems like that all else equal assumption is a bit of a stretch here for fixed effect poisson estimation given everything that you describe.

Hi @mikedenly, I'm glad you find this software useful! :-)

On the issue you're raising, I'm afraid that if there is a deviation from the standard, it may be on ppmlhdfe's side. The SEs in fixest replicate the SEs that you can obtain from GLM (modulo the small sample correction), as shown below:

# run regular model with state & year fixed effects
est_fixest = fepois(fatal ~ beertax | state + year, data=Fatalities, cluster=~state)
est_glm = glm(fatal ~ beertax + as.factor(state) + as.factor(year), 
              data = Fatalities, family = poisson())
coef(est_fixest) - coef(est_glm)["beertax"]
#>       beertax 
#> -8.881784e-15

library(sandwich)
se_fixest = se(est_fixest, ssc = ssc(fixef.K = "full"))
se_glm = se(vcovCL(est_glm, cluster = ~state, type = "HC1"))["beertax"]

se_fixest - se_glm
#>       beertax 
#> -3.339575e-10

@s3alfisc, your remarks are interesting. Did your benchmarks differ systematically? What were the conditions to make a divergence apparent (sample size, FEs, number of variables)? What were the tolerance levels you used to make the difference disappear? Could you share problematic setups?

I actually went back and did some checks - overall, pyfixest gets closer than I recalled:

%load_ext autoreload
%autoreload 2

import numpy as np
import rpy2.robjects as ro
from rpy2.robjects import pandas2ri
from rpy2.robjects.packages import importr

import pyfixest as pf
from pyfixest.utils.utils import ssc
import pandas as pd

pandas2ri.activate()

fixest = importr("fixest")
stats = importr("stats")

data = pf.get_data(N = 100, model = "Fepois")
adj = False
cluster_adj = False
fml = "Y ~ X1 + X2 | f1"
vcov = "hetero"

py_mod = pf.fepois(
    fml, data=data, vcov=vcov, ssc=ssc(adj=adj, cluster_adj=cluster_adj)
)
r_mod = fixest.fepois(
    ro.Formula(fml),
    data=data,
    vcov=vcov,
    ssc=fixest.ssc(adj, "none", cluster_adj, "min", "min", False),
)

py_mod_vcov = py_mod._vcov
r_mod_vcov = stats.vcov(r_mod)

py_mod.coef() - stats.coef(r_mod)
# Coefficient
# X1   -3.490416e-10
# X2    2.342791e-11
# Name: Estimate, dtype: float64

py_mod_vcov - r_mod_vcov

# array([[-7.14779327e-07,  8.56358303e-09],
#       [ 8.56358303e-09, -3.96472177e-08]])

So quite close after all 😅 (but not exactly identical).

lrberge / fixest

fepois() does not fully replicate ppmlhdfe in Stata #507