Double check Poisson mfx

grantmcdermott / etwfe

Extended two-way fixed effects

https://grantmcdermott.com/etwfe/

Other

50 stars 11 forks source link

Double check Poisson mfx #4

Closed grantmcdermott closed 1 year ago

grantmcdermott commented 1 year ago

C.f. With the jwdid Poisson model at the very bottom of this post.

The main results are exactly the same.

library(etwfe)

data("mpdta", package = "did")
mpdta$emp = exp(mpdta$lemp)

# poisson model
pmod = etwfe(emp ~ lpop, tvar=year, gvar=first.treat, gref=0, data=mpdta, 
             vcov=~countyreal, family = "poisson", fe = "none")
#> The variables '.Dtreat:first.treat::2006:year::2004', '.Dtreat:first.treat::2006:year::2005' and eight others have been removed because of collinearity (see $collin.var).

fixest::etable(pmod, digits = 7, signif.code = NA)
#>                                                                        pmod
#> Dependent Var.:                                                         emp
#>                                                                            
#> Constant                                               2.484455 (0.0774941)
#> lpop                                                   1.039625 (0.0168640)
#> lpop x first.treat = 2004                            -0.0132716 (0.0404687)
#> lpop x first.treat = 2006                            -0.0811734 (0.0330063)
#> lpop x first.treat = 2007                             0.0162899 (0.0250988)
#> lpop x year = 2004                                   -0.0066118 (0.0045127)
#> lpop x year = 2005                                   -0.0077402 (0.0059153)
#> lpop x year = 2006                                   -0.0090772 (0.0065360)
#> lpop x year = 2007                                   -0.0035862 (0.0054845)
#> first.treat = 2004                                    0.2361076 (0.2079180)
#> first.treat = 2006                                    0.5927050 (0.1770658)
#> first.treat = 2007                                   -0.1440501 (0.1223770)
#> year = 2004                                          -0.0105493 (0.0209058)
#> year = 2005                                           0.0112893 (0.0270397)
#> year = 2006                                           0.0453898 (0.0291563)
#> year = 2007                                           0.0542584 (0.0280050)
#> .Dtreat x first.treat = 2004 x year = 2004           -0.0309557 (0.0176208)
#> .Dtreat x first.treat = 2004 x year = 2005           -0.0662240 (0.0255833)
#> .Dtreat x first.treat = 2004 x year = 2006           -0.1329713 (0.0237253)
#> .Dtreat x first.treat = 2004 x year = 2007           -0.1170228 (0.0225006)
#> .Dtreat x first.treat = 2006 x year = 2006           -0.0090494 (0.0244213)
#> .Dtreat x first.treat = 2006 x year = 2007           -0.0681486 (0.0251113)
#> .Dtreat x first.treat = 2007 x year = 2007           -0.0399916 (0.0168249)
#> .Dtreat x lpop_dm x first.treat = 2004 x year = 2004  0.0118798 (0.0071285)
#> .Dtreat x lpop_dm x first.treat = 2004 x year = 2005  0.0209935 (0.0119250)
#> .Dtreat x lpop_dm x first.treat = 2004 x year = 2006  0.0409103 (0.0095714)
#> .Dtreat x lpop_dm x first.treat = 2004 x year = 2007  0.0250348 (0.0112873)
#> .Dtreat x lpop_dm x first.treat = 2006 x year = 2006  0.0375837 (0.0136183)
#> .Dtreat x lpop_dm x first.treat = 2006 x year = 2007  0.0453804 (0.0164515)
#> .Dtreat x lpop_dm x first.treat = 2007 x year = 2007 -0.0109675 (0.0072256)
#> ________________________________________             ______________________
#> S.E.: Clustered                                              by: countyreal
#> Observations                                                          2,500
#> Squared Cor.                                                        0.95698
#> Pseudo R2                                                           0.95262
#> BIC                                                               287,872.5

But the marginal effects are different.

emfx(pmod, type = "event")
#>      Term    Contrast event  Effect Std. Error  z value   Pr(>|z|)   2.5 %  97.5 %
#> 1 .Dtreat mean(dY/dX)     0 -22.849      16.17 -1.41348 0.15751423  -54.53   8.834
#> 2 .Dtreat mean(dY/dX)     1   3.654      41.81  0.08739 0.93035790  -78.30  85.607
#> 3 .Dtreat mean(dY/dX)     2 -71.521      22.06 -3.24264 0.00118430 -114.75 -28.291
#> 4 .Dtreat mean(dY/dX)     3 -97.798      25.45 -3.84321 0.00012144 -147.67 -47.923
#> 
#> Model type:  etwfe 
#> Prediction type:  response

@vincentarelbundock Do you have any ideas? (Sorry for tagging and running, but wanted to get this down quickly before I have to head out!)

vincentarelbundock commented 1 year ago

Frankly, I don't really understand what is going on, and I don't have time for a deep dive in the code and blog post now. Maybe you can give me a numbered list of the post-processing steps that we want to apply to pmod?

grantmcdermott commented 1 year ago

Totally. Again, sorry for tagging and running earlier. No rush on this; I should keep digging on my own.

Here's what emfx(pmod, type = "event") is doing underneath the hood:

Data: It grabs the dataset that was used for the original pmod estimation. It then subsets this dataset to rows where ".Dreat==1" (i.e. treatment was switched on) and adds an "event" column (i.e., year minus first year of treatment).
Marginal effects: The dataset from step 1 is passed to marginaleffects(), highlighting the ".Dtreat" variable by "event".

The manual version of emfx(pmod, type = "event") is thus:

library(marginaleffects)

mfx = marginaleffects(
  pmod,
  variable = ".Dtreat",
  by = "event",
  newdata = within(
    subset(fixest:::fetch_data(pmod), .Dtreat==1),
    event <- year - first.treat
    )
  )

summary(mfx)
#>      Term    Contrast event  Effect Std. Error  z value   Pr(>|z|)   2.5 %  97.5 %
#> 1 .Dtreat mean(dY/dX)     0 -22.849      16.17 -1.41348 0.15751423  -54.53   8.834
#> 2 .Dtreat mean(dY/dX)     1   3.654      41.81  0.08739 0.93035790  -78.30  85.607
#> 3 .Dtreat mean(dY/dX)     2 -71.521      22.06 -3.24264 0.00118430 -114.75 -28.291
#> 4 .Dtreat mean(dY/dX)     3 -97.798      25.45 -3.84321 0.00012144 -147.67 -47.923
#> 
#> Model type:  etwfe 
#> Prediction type:  response

Now I should clarify that I believe that this is the same thing that jwdid -> estat is doing in Stata. At least, we get the exact same results across R and Stata with these steps for linear models. (The README provides an example of this.) But as you can see below, we get different results from the equivalent Poisson model in Stata. Again, note that the actual Poisson regressions are identical (etwfe == jwdid). It's only with the emfx / estat post-estimation where things diverge.

use https://friosavila.github.io/playingwithstata/drdid/mpdta.dta, clear
gen emp=exp(lemp)
qui jwdid  emp lpop, ivar(countyreal) tvar(year) gvar(first_treat) method(poisson)

estat event
[omitting what you dont want to see]

---------------------------------------------------------------
              |            Delta-method
              |   Contrast   std. err.     [95% conf. interval]
--------------+------------------------------------------------
_at@__event__ |
  (2 vs 1) 0  |  -37.35025   15.75917     -68.23764   -6.462846
  (2 vs 1) 1  |  -117.3918   33.88822     -183.8115   -50.97213
  (2 vs 1) 2  |  -204.6293   39.72574     -282.4903   -126.7683
  (2 vs 1) 3  |  -182.7539   39.14667       -259.48   -106.0278
---------------------------------------------------------------

@friosavila, are you able to confirm the exact margins command that is being run with estat event here? Many thanks.

friosavila commented 1 year ago

So, what I do in Stata is 1) just like you did, create the “event” variable 2) make predictions for the treated sample only, separated for each “event” group. 3) the predictions are done either using Treatment =1 it Treat =0

Margins does the rest (in terms of SE)

Here is the manual version:

ssc install frause
frause mpdta, clear
gen emp=exp(lemp)
jwdid  emp lpop, ivar(countyreal) tvar(year) gvar(first_treat) method(poisson)
replace __tr__=0
predict yhat0
replace __tr__=1
predict yhat1
gen att=yhat1-yhat0
gen event = year - first_treat if first!=0
tabstat att, by(event)   event |      Mean
---------+----------
      -4 |         0
      -3 |         0
      -2 |         0
      -1 |         0
       0 | -37.35025
       1 | -117.3918
       2 | -204.6293
       3 |  -182.754
---------+----------
   Total | -22.95819
--------------------

vincentarelbundock commented 1 year ago

This is extremely weird, but I don’t think the problem is related to marginaleffects. Here’s an attempt to replicate the “manual” Stata calculations using purely base R code and the fixest:::predict() method:

library(etwfe)

data("mpdta", package = "did")
mpdta$emp = exp(mpdta$lemp)

pmod = etwfe(
    emp ~ lpop,
    tvar = year, gvar = first.treat, gref = 0, data = mpdta,
    vcov = ~countyreal, family = "poisson", fe = "none")

nd <- fixest:::fetch_data(pmod) |>
    subset(.Dtreat == 1) |>
    transform(event = year - first.treat)

p <- data.frame(
    p1 = predict(pmod, transform(nd, .Dtreat = 1)),
    p0 = predict(pmod, transform(nd, .Dtreat = 0)),
    event = nd$year - nd$first.treat) |>
    transform(att = p1 - p0)

aggregate(att ~ event, FUN = mean, data = p)
#>   event         att
#> 1     0  -25.349748
#> 2     1    1.091751
#> 3     2  -75.124632
#> 4     3 -101.823979

Since the base R code doesn’t replicate Stata, I think we have to conclude that this is either a difference in fixest:::predict() or in the creation of the event variable.

For completeness, note that you can replicate the results above with marginaleffects:

library(marginaleffects)

comparisons(
    pmod,
    variables = ".Dtreat",
    newdata = nd,
    by = "event") |>
    summary()
#>      Term          Contrast event   Effect Std. Error  z value  Pr(>|z|)
#> 1 .Dtreat mean(1) - mean(0)     0  -25.350      15.89 -1.59556 0.1105874
#> 2 .Dtreat mean(1) - mean(0)     1    1.092      40.30  0.02709 0.9783867
#> 3 .Dtreat mean(1) - mean(0)     2  -75.125      23.16 -3.24441 0.0011769
#> 4 .Dtreat mean(1) - mean(0)     3 -101.824      27.09 -3.75899 0.0001706
#>     2.5 % 97.5 %
#> 1  -56.49   5.79
#> 2  -77.89  80.08
#> 3 -120.51 -29.74
#> 4 -154.92 -48.73
#> 
#> Model type:  etwfe 
#> Prediction type:  response

If you show me base R code that does what you want to do, I can probably show you an easier way to achieve the same result with marginaleffects (with standard errors). But I’d need to know exactly how we can replicate the Stata code in @friosavila’s comment.

grantmcdermott commented 1 year ago

Thanks both!

I think we're getting closer. I'll try to investigate more during the week. But I just quickly wanted to show that I get the same answer using base R glm as with fixest (as per Vincent's inquiry). The difference is probably some difference in the underlying defaults of R and Stata here, rather than a specific package.

data("mpdta", package = "did")
mpdta$emp = exp(mpdta$lemp)

mpdta = mpdta |>
    within({
        .Dtreat = as.integer(year >= first.treat & first.treat != 0)
        lpop_dm = ave(lpop, first.treat, year, FUN = function(x) x - mean(x, na.rm=TRUE))
    })

# head(mpdta)

# Base R equivalent of etwfe...
pmod_base = glm(
    emp ~ .Dtreat:as.factor(first.treat):as.factor(year) / lpop_dm +
        lpop + as.factor(first.treat):lpop + as.factor(year):lpop +
        as.factor(first.treat) + as.factor(year),
    family = "poisson", dat = mpdta
)
# summary(pmod_base)

nd <- mpdta |>
    subset(.Dtreat == 1) |>
    transform(event = year - first.treat)

p <- data.frame(
    p1 = predict(pmod_base, transform(nd, .Dtreat = 1), type = "response"),
    p0 = predict(pmod_base, transform(nd, .Dtreat = 0), type = "response"),
    event = nd$year - nd$first.treat) |>
    transform(att = p1 - p0)
#> Warning in predict.lm(object, newdata, se.fit, scale = 1, type = if (type == :
#> prediction from a rank-deficient fit may be misleading

#> Warning in predict.lm(object, newdata, se.fit, scale = 1, type = if (type == :
#> prediction from a rank-deficient fit may be misleading

aggregate(att ~ event, FUN = mean, data = p)
#>   event         att
#> 1     0  -25.349748
#> 2     1    1.091751
#> 3     2  -75.124632
#> 4     3 -101.823979

library(marginaleffects)
comparisons(
    pmod_base,
    variables = ".Dtreat",
    newdata = nd,
    by = "event") |>
    summary()
#> Warning: Model matrix is rank deficient. Some variance-covariance parameters are
#>   missing.
#>      Term          Contrast event   Effect Std. Error z value   Pr(>|z|)
#> 1 .Dtreat mean(1) - mean(0)     0  -25.350      3.448 -7.3529 1.9391e-13
#> 2 .Dtreat mean(1) - mean(0)     1    1.092      7.081  0.1542    0.87747
#> 3 .Dtreat mean(1) - mean(0)     2  -75.125     12.553 -5.9844 2.1717e-09
#> 4 .Dtreat mean(1) - mean(0)     3 -101.824     13.014 -7.8244 5.1024e-15
#>     2.5 % 97.5 %
#> 1  -32.11 -18.59
#> 2  -12.79  14.97
#> 3  -99.73 -50.52
#> 4 -127.33 -76.32
#> 
#> Model type:  glm 
#> Prediction type:  response

^{Created on 2022-10-16 with reprex v2.0.2}

friosavila commented 1 year ago

Hi there, I went into the rabbit hole. Not sure why, but wherever you predict values with .Dtreat=0, we obtain things that are quite different. And I'm not sure why. consider countyreal = 8001 year 2007. With Dtreat=0, Stata predicts an outcome of 5081.009. R, instead predicts 5431.64298. Im not sure about why this difference arises. My main guess is that in R's version, there are many unestimated coefficients. In my first implementations of JWDID, I was having problems with margins understanding that, so I ended up collecting all relevant coefficients, rather than letting Stata make all interactions. I guess the only way to figure this out is to reduce the problem, and estimate the predicted values manually.

Best wishes Fernando

On Mon, Oct 17, 2022 at 2:12 AM Grant McDermott @.***> wrote:

Thanks both!

I think we're getting closer. I'll try to investigate more during the week. But I just quickly wanted to show that I get the same answer using base R glm as with fixest (as per Vincent's inquiry). The difference is probably some difference in the underlying defaults of R and Stata here, rather than a specific package.

data("mpdta", package = "did")mpdta$emp = exp(mpdta$lemp) mpdta = mpdta |> within({ .Dtreat = as.integer(year >= first.treat & first.treat != 0) lpop_dm = ave(lpop, first.treat, year, FUN = function(x) x - mean(x, na.rm=TRUE)) })

head(mpdta)

Base R equivalent of etwfe...pmod_base = glm(
emp ~ .Dtreat:as.factor(first.treat):as.factor(year) / lpop_dm +
    lpop + as.factor(first.treat):lpop + as.factor(year):lpop +
    as.factor(first.treat) + as.factor(year),
family = "poisson", dat = mpdta
)# summary(pmod_base) nd <- mpdta |> subset(.Dtreat == 1) |> transform(event = year - first.treat) p <- data.frame( p1 = predict(pmod_base, transform(nd, .Dtreat = 1), type = "response"), p0 = predict(pmod_base, transform(nd, .Dtreat = 0), type = "response"), event = nd$year - nd$first.treat) |> transform(att = p1 - p0)#> Warning in predict.lm(object, newdata, se.fit, scale = 1, type = if (type == :#> prediction from a rank-deficient fit may be misleading

> Warning in predict.lm(object, newdata, se.fit, scale = 1, type = if (type == :#> prediction from a rank-deficient fit may be misleading

aggregate(att ~ event, FUN = mean, data = p)#> event att#> 1 0 -25.349748#> 2 1 1.091751#> 3 2 -75.124632#> 4 3 -101.823979

library(marginaleffects) comparisons( pmod_base, variables = ".Dtreat", newdata = nd, by = "event") |> summary()#> Warning: Model matrix is rank deficient. Some variance-covariance parameters are#> missing.#> Term Contrast event Effect Std. Error z value Pr(>|z|)#> 1 .Dtreat mean(1) - mean(0) 0 -25.350 3.448 -7.3529 1.9391e-13#> 2 .Dtreat mean(1) - mean(0) 1 1.092 7.081 0.1542 0.87747#> 3 .Dtreat mean(1) - mean(0) 2 -75.125 12.553 -5.9844 2.1717e-09#> 4 .Dtreat mean(1) - mean(0) 3 -101.824 13.014 -7.8244 5.1024e-15#> 2.5 % 97.5 %#> 1 -32.11 -18.59#> 2 -12.79 14.97#> 3 -99.73 -50.52#> 4 -127.33 -76.32#> #> Model type: glm #> Prediction type: response

Created on 2022-10-16 with reprex v2.0.2 https://reprex.tidyverse.org

— Reply to this email directly, view it on GitHub https://github.com/grantmcdermott/etwfe/issues/4#issuecomment-1280338664, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASZKKFRKX3H335XZQAENKO3WDTU3RANCNFSM6AAAAAARGNR464 . You are receiving this because you were mentioned.Message ID: @.***>

grantmcdermott commented 1 year ago

Ah, that's very helpful. Thanks for the sleuthing @friosavila!

Just confirming visually what you're saying: The only differences occur with .Dtreat==0.

The highlighted differences, in red, are at an arbitrarily small cut-off... and even the largest differences are pretty small on log scale (the original scale of the data).

It's not clear to me that either one of R or Stata is obviously (in)correct here. If I collect the predictions from R and Stata and compare against the true emp (e.g. for yhat0 where .Dtreat==0) R has a slightly lower MAE and median absolute prediction error. But again the differences are very small (less than half a percent).

I'll leave this issue open and continue digging. But it might ultimately be a convergence (or even rounding) issue that's hard to pin down and not worth stressing too much about.

vincentarelbundock commented 1 year ago

Given the warning above, my best guess is that the design matrix is rank deficient, and that in order to compute predictions, R and Stata both drop a somewhat arbitrary set of columns. If the columns that R and Stata happen to drop are not exactly the same, maybe this could give rise to some numerical issues and minor discrepancies.

For reference, in R this is done by the drop() call on line 686 inside predict.lm():

https://github.com/SurajGupta/r-source/blob/master/src/library/stats/R/lm.R#L679

grantmcdermott commented 1 year ago

Given the warning above, my best guess is that the design matrix is rank deficient, and that in order to compute predictions, R and Stata both drop a somewhat arbitrary set of columns.

Thanks @vincentarelbundock. I think this is the same point @friosavila was making earlier in the thread. For the record, though I don't think rank deficiency is the issue. fixest (what etwfe uses underneath the hood) drops these collinear terms at run time prior to estimation. We can also confirm the same results with base glm if we construct the dataset and remove all the collinear terms ahead of time.

Show "preconstructed" results

``` r data("mpdta", package = "did") dat = mpdta |> transform( emp = exp(lemp), event = ifelse(first.treat != 0, year - first.treat, NA), .Dtreat = as.integer(year >= first.treat & first.treat != 0), lpop_dm = ave(lpop, first.treat, year, FUN = function(x) x - mean(x, na.rm=TRUE)), first.treat = factor(first.treat), year = factor(year) ) gmod = glm( emp ~ .Dtreat:first.treat:year / lpop_dm + lpop * (first.treat + year), family = 'poisson', data = dat ) ## get rid of NA (collinear) coefs and re-estimate ---- coefs = coefficients(gmod) coefs = names(coefs[which(!is.na(coefs))]) mm = model.matrix(gmod) mm = mm[, coefs] dat2 = as.data.frame(cbind(emp = dat$emp, mm[, -1])) gmod2 = glm( emp ~ ., family = "poisson", data = dat2 ) # summary(gmod2) ## construct counterfactual datasets dat2_0 = dat2_1 = dat2 Dcols = grep("Dtreat", colnames(dat2)) dat2_0[, Dcols] = 0 # dat2_1[, Dcols] = 1 # already true since .Dtreat = 1 by definition ## get ATTs ---- dat = dat |> transform( yhat1 = predict(gmod2, newdata = dat2_1, type = "response"), yhat0 = predict(gmod2, newdata = dat2_0, type = "response") ) |> transform( att = yhat1 - yhat0 ) aggregate(att ~ event, FUN = mean, data = dat) #> event att #> 1 -4 0.000000 #> 2 -3 0.000000 #> 3 -2 0.000000 #> 4 -1 0.000000 #> 5 0 -25.349748 #> 6 1 1.091751 #> 7 2 -75.124632 #> 8 3 -101.823979 ``` ^{Created on 2022-11-28 with [reprex v2.0.2](https://reprex.tidyverse.org)}

friosavila commented 1 year ago

Ok it took a while of looking at the code! The source of the difference comes from how things are being interacted.

consider the Stata code

qui poisson emp c.tr#i.first_treat#i.year c.tr#i.first_treat#i.year#c._x_lpop /// lpop c.lpop#i.first_treat c.lpop#i.year i.first_treat i.year, /// cluster(countyreal)

asd

On Mon, Nov 28, 2022 at 6:31 PM Grant McDermott @.***> wrote:

Given the warning above, my best guess is that the design matrix is rank deficient, and that in order to compute predictions, R and Stata both drop a somewhat arbitrary set of columns.

Thanks @vincentarelbundock https://github.com/vincentarelbundock. I think this is the same point @friosavila https://github.com/friosavila was making earlier in the thread. For the record, though I don't think rank deficiency is the issue. fixest (what etwfe uses underneath the hood) drops these collinear terms at run time. We can also confirm the same results with base glm if we construct the dataset and (non-colinear) interactions ahead of time. Show "preconstructed" results

data("mpdta", package = "did") dat = mpdta |> transform( emp = exp(lemp), event = ifelse(first.treat != 0, year - first.treat, NA), .Dtreat = as.integer(year >= first.treat & first.treat != 0), lpop_dm = ave(lpop, first.treat, year, FUN = function(x) x - mean(x, na.rm=TRUE)), first.treat = factor(first.treat), year = factor(year) ) gmod = glm( emp ~ .Dtreat:first.treat:year / lpop_dm + lpop * (first.treat + year), family = 'poisson', data = dat )

get rid of NA (collinear) coefs and re-estimate ----

coefs = coefficients(gmod)coefs = names(coefs[which(!is.na(coefs))]) mm = model.matrix(gmod)mm = mm[, coefs] dat2 = as.data.frame(cbind(emp = dat$emp, mm[, -1])) gmod2 = glm( emp ~ ., family = "poisson", data = dat2 )# summary(gmod2)

construct counterfactual datasetsdat2_0 = dat2_1 = dat2Dcols = grep("Dtreat", colnames(dat2))dat2_0[, Dcols] = 0# dat2_1[, Dcols] = 1 # already true since .Dtreat = 1 by definition

get ATTs ----

dat = dat |> transform( yhat1 = predict(gmod2, newdata = dat2_1, type = "response"), yhat0 = predict(gmod2, newdata = dat2_0, type = "response") ) |> transform( att = yhat1 - yhat0 )

aggregate(att ~ event, FUN = mean, data = dat)#> event att#> 1 -4 0.000000#> 2 -3 0.000000#> 3 -2 0.000000#> 4 -1 0.000000#> 5 0 -25.349748#> 6 1 1.091751#> 7 2 -75.124632#> 8 3 -101.823979

Created on 2022-11-28 with reprex v2.0.2 https://reprex.tidyverse.org

All of this prompted me to try implementing the same, manual version of the ETWFE regression in Stata. Here (finally) I think we're getting closer to underlying issue. As you can see, I get exactly the same ATT results from this manual Stata version as I do in R.

. frause mpdta, clear (Written by R. )

. gen emp = exp(lemp)

. qui jwdid emp lpop, ivar(countyreal) tvar(year) gvar(first_treat) method(poisson)

. qui poisson emp c.tr#i.first_treat#i.year c.tr#i.first_treat#i.year#c._x_lpop /// lpop c.lpop#i.first_treat c.lpop#i.year i.first_treat i.year, /// cluster(countyreal)

. replace tr = 1 (2,209 real changes made)

. predict yhat1 (option n assumed; predicted number of events)

. replace tr = 0 (2,500 real changes made)

. predict yhat0 (option n assumed; predicted number of events)

. gen att=yhat1-yhat0

. gen event = year - first_treat if first!=0 (1,545 missing values generated)

. tabstat att, by(event)

Summary for variables: att Group variable: event

event | Mean---------+---------- -4 | 0 -3 | 0 -2 | 0 -1 | 0 0 | -25.34976 1 | 1.091738 2 | -75.12462 3 | -101.824---------+---------- Total | -8.707092--------------------

@friosavila https://github.com/friosavila Do you have any thoughts on this? Does the manual Stata version above correspond to what jwdid is doing underneath the hood?

— Reply to this email directly, view it on GitHub https://github.com/grantmcdermott/etwfe/issues/4#issuecomment-1329883375, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASZKKFUTXGWOELTKY6AXEI3WKU6DXANCNFSM6AAAAAARGNR464 . You are receiving this because you were mentioned.Message ID: @.***>

friosavila commented 1 year ago

Again, consider the code qui poisson emp c.tr#i.first_treat#i.year c.tr#i.first_treat#i.year#c._x_lpop /// lpop c.lpop#i.first_treat c.lpop#i.year i.first_treat i.year, /// cluster(countyreal)

The second line is being interacted with c.tr, however the way I programmed it in jwdid I do not interact that second line with the treatment. Right now I'm not sure which one is the correct approach, So I will go back to Wooldridge paper to see what he suggests.

At the very least, we know now the source of the difference! Fernando

On Mon, Nov 28, 2022 at 9:52 PM Fernando Rios-Avila @.***> wrote:

Ok it took a while of looking at the code! The source of the difference comes from how things are being interacted.

consider the Stata code

qui poisson emp c.tr#i.first_treat#i.year c.tr#i.first_treat#i.year#c._x_lpop /// lpop c.lpop#i.first_treat c.lpop#i.year i.first_treat i.year, /// cluster(countyreal)

asd

On Mon, Nov 28, 2022 at 6:31 PM Grant McDermott @.***> wrote:

Given the warning above, my best guess is that the design matrix is rank deficient, and that in order to compute predictions, R and Stata both drop a somewhat arbitrary set of columns.

Thanks @vincentarelbundock https://github.com/vincentarelbundock. I think this is the same point @friosavila https://github.com/friosavila was making earlier in the thread. For the record, though I don't think rank deficiency is the issue. fixest (what etwfe uses underneath the hood) drops these collinear terms at run time. We can also confirm the same results with base glm if we construct the dataset and (non-colinear) interactions ahead of time. Show "preconstructed" results

data("mpdta", package = "did") dat = mpdta |> transform( emp = exp(lemp), event = ifelse(first.treat != 0, year - first.treat, NA), .Dtreat = as.integer(year >= first.treat & first.treat != 0), lpop_dm = ave(lpop, first.treat, year, FUN = function(x) x - mean(x, na.rm=TRUE)), first.treat = factor(first.treat), year = factor(year) ) gmod = glm( emp ~ .Dtreat:first.treat:year / lpop_dm + lpop * (first.treat + year), family = 'poisson', data = dat )

get rid of NA (collinear) coefs and re-estimate ----

coefs = coefficients(gmod)coefs = names(coefs[which(!is.na(coefs))]) mm = model.matrix(gmod)mm = mm[, coefs] dat2 = as.data.frame(cbind(emp = dat$emp, mm[, -1])) gmod2 = glm( emp ~ ., family = "poisson", data = dat2 )# summary(gmod2)

construct counterfactual datasetsdat2_0 = dat2_1 = dat2Dcols = grep("Dtreat", colnames(dat2))dat2_0[, Dcols] = 0# dat2_1[, Dcols] = 1 # already true since .Dtreat = 1 by definition

get ATTs ----

dat = dat |> transform( yhat1 = predict(gmod2, newdata = dat2_1, type = "response"), yhat0 = predict(gmod2, newdata = dat2_0, type = "response") ) |> transform( att = yhat1 - yhat0 )

aggregate(att ~ event, FUN = mean, data = dat)#> event att#> 1 -4 0.000000#> 2 -3 0.000000#> 3 -2 0.000000#> 4 -1 0.000000#> 5 0 -25.349748#> 6 1 1.091751#> 7 2 -75.124632#> 8 3 -101.823979

Created on 2022-11-28 with reprex v2.0.2 https://reprex.tidyverse.org

All of this prompted me to try implementing the same, manual version of the ETWFE regression in Stata. Here (finally) I think we're getting closer to underlying issue. As you can see, I get exactly the same ATT results from this manual Stata version as I do in R.

. frause mpdta, clear (Written by R. )

. gen emp = exp(lemp)

. qui jwdid emp lpop, ivar(countyreal) tvar(year) gvar(first_treat) method(poisson)

. qui poisson emp c.tr#i.first_treat#i.year c.tr#i.first_treat#i.year#c._x_lpop /// lpop c.lpop#i.first_treat c.lpop#i.year i.first_treat i.year, /// cluster(countyreal)

. replace tr = 1 (2,209 real changes made)

. predict yhat1 (option n assumed; predicted number of events)

. replace tr = 0 (2,500 real changes made)

. predict yhat0 (option n assumed; predicted number of events)

. gen att=yhat1-yhat0

. gen event = year - first_treat if first!=0 (1,545 missing values generated)

. tabstat att, by(event)

Summary for variables: att Group variable: event

event | Mean---------+---------- -4 | 0 -3 | 0 -2 | 0 -1 | 0 0 | -25.34976 1 | 1.091738 2 | -75.12462 3 | -101.824---------+---------- Total | -8.707092--------------------

@friosavila https://github.com/friosavila Do you have any thoughts on this? Does the manual Stata version above correspond to what jwdid is doing underneath the hood?

— Reply to this email directly, view it on GitHub https://github.com/grantmcdermott/etwfe/issues/4#issuecomment-1329883375, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASZKKFUTXGWOELTKY6AXEI3WKU6DXANCNFSM6AAAAAARGNR464 . You are receiving this because you were mentioned.Message ID: @.***>

friosavila commented 1 year ago

Going a bit further. In the linear model, It matters not if c.tr is interacted with c._x_lpop or not, because the expected value of _x_lpop is zero. But that is not the case for nonlinear models (the average impact will not be zero)

On Mon, Nov 28, 2022 at 9:57 PM Fernando Rios-Avila @.***> wrote:

Again, consider the code qui poisson emp c.tr#i.first_treat#i.year c.tr#i.first_treat#i.year#c._x_lpop /// lpop c.lpop#i.first_treat c.lpop#i.year i.first_treat i.year, /// cluster(countyreal)

The second line is being interacted with c.tr, however the way I programmed it in jwdid I do not interact that second line with the treatment. Right now I'm not sure which one is the correct approach, So I will go back to Wooldridge paper to see what he suggests.

At the very least, we know now the source of the difference! Fernando

On Mon, Nov 28, 2022 at 9:52 PM Fernando Rios-Avila @.***> wrote:

Ok it took a while of looking at the code! The source of the difference comes from how things are being interacted.

consider the Stata code

qui poisson emp c.tr#i.first_treat#i.year c.tr#i.first_treat#i.year#c._x_lpop /// lpop c.lpop#i.first_treat c.lpop#i.year i.first_treat i.year, /// cluster(countyreal)

asd

On Mon, Nov 28, 2022 at 6:31 PM Grant McDermott @.***> wrote:

Given the warning above, my best guess is that the design matrix is rank deficient, and that in order to compute predictions, R and Stata both drop a somewhat arbitrary set of columns.

Thanks @vincentarelbundock https://github.com/vincentarelbundock. I think this is the same point @friosavila https://github.com/friosavila was making earlier in the thread. For the record, though I don't think rank deficiency is the issue. fixest (what etwfe uses underneath the hood) drops these collinear terms at run time. We can also confirm the same results with base glm if we construct the dataset and (non-colinear) interactions ahead of time. Show "preconstructed" results

data("mpdta", package = "did") dat = mpdta |> transform( emp = exp(lemp), event = ifelse(first.treat != 0, year - first.treat, NA), .Dtreat = as.integer(year >= first.treat & first.treat != 0), lpop_dm = ave(lpop, first.treat, year, FUN = function(x) x - mean(x, na.rm=TRUE)), first.treat = factor(first.treat), year = factor(year) ) gmod = glm( emp ~ .Dtreat:first.treat:year / lpop_dm + lpop * (first.treat + year), family = 'poisson', data = dat )

get rid of NA (collinear) coefs and re-estimate ----

coefs = coefficients(gmod)coefs = names(coefs[which(!is.na(coefs))]) mm = model.matrix(gmod)mm = mm[, coefs] dat2 = as.data.frame(cbind(emp = dat$emp, mm[, -1])) gmod2 = glm( emp ~ ., family = "poisson", data = dat2 )# summary(gmod2)

construct counterfactual datasetsdat2_0 = dat2_1 = dat2Dcols = grep("Dtreat", colnames(dat2))dat2_0[, Dcols] = 0# dat2_1[, Dcols] = 1 # already true since .Dtreat = 1 by definition

get ATTs ----

dat = dat |> transform( yhat1 = predict(gmod2, newdata = dat2_1, type = "response"), yhat0 = predict(gmod2, newdata = dat2_0, type = "response") ) |> transform( att = yhat1 - yhat0 )

aggregate(att ~ event, FUN = mean, data = dat)#> event att#> 1 -4 0.000000#> 2 -3 0.000000#> 3 -2 0.000000#> 4 -1 0.000000#> 5 0 -25.349748#> 6 1 1.091751#> 7 2 -75.124632#> 8 3 -101.823979

Created on 2022-11-28 with reprex v2.0.2 https://reprex.tidyverse.org

All of this prompted me to try implementing the same, manual version of the ETWFE regression in Stata. Here (finally) I think we're getting closer to underlying issue. As you can see, I get exactly the same ATT results from this manual Stata version as I do in R.

. frause mpdta, clear (Written by R. )

. gen emp = exp(lemp)

. qui jwdid emp lpop, ivar(countyreal) tvar(year) gvar(first_treat) method(poisson)

. qui poisson emp c.tr#i.first_treat#i.year c.tr#i.first_treat#i.year#c._x_lpop /// lpop c.lpop#i.first_treat c.lpop#i.year i.first_treat i.year, /// cluster(countyreal)

. replace tr = 1 (2,209 real changes made)

. predict yhat1 (option n assumed; predicted number of events)

. replace tr = 0 (2,500 real changes made)

. predict yhat0 (option n assumed; predicted number of events)

. gen att=yhat1-yhat0

. gen event = year - first_treat if first!=0 (1,545 missing values generated)

. tabstat att, by(event)

Summary for variables: att Group variable: event

event | Mean---------+---------- -4 | 0 -3 | 0 -2 | 0 -1 | 0 0 | -25.34976 1 | 1.091738 2 | -75.12462 3 | -101.824---------+---------- Total | -8.707092--------------------

@friosavila https://github.com/friosavila Do you have any thoughts on this? Does the manual Stata version above correspond to what jwdid is doing underneath the hood?

— Reply to this email directly, view it on GitHub https://github.com/grantmcdermott/etwfe/issues/4#issuecomment-1329883375, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASZKKFUTXGWOELTKY6AXEI3WKU6DXANCNFSM6AAAAAARGNR464 . You are receiving this because you were mentioned.Message ID: @.***>

grantmcdermott commented 1 year ago

Ah, good catch @friosavila!

I haven't had time to look back into the Wooldridge paper myself yet. I've got some imminent work deadlines, but will dive back into it afterwards.

In the meantime, please do let me know if you figure out which is the right way. It's quite possible I made an error translating his original do files.

friosavila commented 1 year ago

I just went through them, and I was the one who missed the interaction. Just submitted the new version to SSC Fernando

On Tue, Nov 29, 2022 at 1:48 AM Grant McDermott @.***> wrote:

Ah, good catch @friosavila https://github.com/friosavila!

I haven't had time to look back into the Wooldridge paper myself yet. I've got some work imminent deadlines, but will dive back into it afterwards.

In the meantime, please do let me know if you figure out which is the right way. It's quite possible I made an error translating his original do files.

G

— Reply to this email directly, view it on GitHub https://github.com/grantmcdermott/etwfe/issues/4#issuecomment-1330160291, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASZKKFRTRZEP3GJZPUIIK5DWKWRMJANCNFSM6AAAAAARGNR464 . You are receiving this because you were mentioned.Message ID: @.***>

grantmcdermott commented 1 year ago

I just went through them, and I was the one who missed the interaction. Just submitted the new version to SSC

Super. Thanks for confirming, Fernando. Really appreciate you helping me to troubleshoot and stoked that we cracked it in the end. Congrats on the new SSC release. I'll aim to submit this pkg to CRAN shortly too.