Closed mariofiorini closed 1 year ago
Scott? :-)
Yeah, looks like an upstream issue with marginaleffects. I'm not at my computer rn to check, but have you tried passing a vanilla fixest::fepois()
model with an offset to marginaleffects::slopes()
? If that causes a problem, then we should file an issue over there. Please try and lmk.
Hi Grant, I am really sorry, I don't know where 'Scott' came from! Following your suggestion I tried
library(fixest)
library(marginaleffects)
data(trade)
gravity_pois_offset = fepois(data = trade, Euros ~ 1 | Origin , offset = ~log(dist_km))
marginaleffects::slopes(gravity_pois_offset)
and got the error message
Error: The function supplied to the `transform_pre` argument must accept two numeric vectors of predicted probabilities of length 0, and
return a single numeric value or a numeric vector of length 0, with no missing value.
In addition: Warning messages:
1: Unable to extract a variance-covariance matrix from this model.
2: In formals(fun) : argument is not a function
If I run it without the offset it works
gravity_pois_nooffset = fepois(data = trade, Euros ~ 1 | Origin )
marginaleffects::slopes(gravity_pois_3)
Term Contrast Estimate
Origin BE - AT 54414051
Origin BE - AT 54414051
Origin BE - AT 54414051
Origin BE - AT 54414051
Origin BE - AT 54414051
--- 536540 rows omitted. See ?avg_slopes and ?print.marginaleffects ---
Origin SE - AT -4449567
Origin SE - AT -4449567
Origin SE - AT -4449567
Origin SE - AT -4449567
Origin SE - AT -4449567
Prediction type: response
Columns: rowid, type, term, contrast, estimate, predicted, predicted_hi, predicted_lo, Euros, Origin
Warning message:
Unable to extract a variance-covariance matrix from this model.
Note that when using the mpdta
data set I got an error message even without the offset, but maybe there I am making a mistake?
fit.no.offset <- fepois(fml = emp ~ 1 | year , data = mpdta)
marginaleffects::slopes(fit.no.offset)
Error in `[[<-.data.frame`(`*tmp*`, "type", value = "response") :
replacement has 1 row, data has 0
Great, thanks for double checking. Given that this is an upstream issue, I'm going to close it here. Please file your previous example on the marginaleffects issue page; I'm sure Vincent will get to the bottom of it soon.
Feel free to reopen here if I'm missing something, or if the problem persists after any updates to marginaleffects.
PS. Please hyperlink to this issue when you file your issue on the marginaleffects repo (just paste the URL at the top of this page somewhere in the body of your issue text). That way, these two issues will be linked and anyone who runs into the same problem will be able to follow the chain to any fix.
Hi there,
I am also wrestling with the implementation of an exposure variable within etwfe
.
I am investigating the impact of wildlife management interventions on reported wildlife strike metrics via a staggered DiD identification strategy. My data is a balanced panel of 24 US airports (14 that receive treatment and 11 that never receive treatment) over the 2005-2021 period. The count of wildlife strikes at airport i in year t is a direct function of the number of aircraft movements that occurred at that airport--i.e., exposure. I am hoping to implement nonlinear DiD with an offset term so that I can model "wildlife strike rates" as opposed to "wildlife strike counts".
Here is my data:
> head(df)
airport year strikes log_movements treated treatment_year treat
1 BAF 2005 0 11.01602 1 2009 0
2 BAF 2006 0 11.12964 1 2009 0
3 BAF 2007 4 11.05263 1 2009 0
4 BAF 2008 6 10.98798 1 2009 0
5 BAF 2009 6 10.99484 1 2009 1
6 BAF 2010 6 11.03463 1 2009 1
etwfe
with offset
:Like mariofiorni
, I am unable to utilize offset
and recover ATTs.
offset
included in the model:
mod <- etwfe(fml = strikes ~ 1,
tvar = year,
gvar = treatment_year,
vcov = ~airport,
family = "poisson",
data = df)
emfx(mod)
> mod <- etwfe(fml = strikes ~ 1,
+ tvar = year,
+ gvar = treatment_year,
+ vcov = ~airport,
+ family = "poisson",
+ data = df)
The variables '.Dtreat:treatment_year::2008:year::2006', '.Dtreat:treatment_year::2008:year::2007' and forty-three others have been removed because of collinearity (see $collin.var).
> emfx(mod)
Term Contrast .Dtreat Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
.Dtreat mean(TRUE) - mean(FALSE) TRUE 13.1 6.99 1.87 0.0619 -0.652 26.8
Prediction type: response
Columns: type, term, contrast, .Dtreat, estimate, std.error, statistic, p.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo
offset
included as an etwfe
option:
mod <- etwfe(fml = strikes ~ 1,
tvar = year,
gvar = treatment_year,
vcov = ~airport,
family = "poisson",
data = df,
offset = ~log_movements)
emfx(mod)
> mod <- etwfe(fml = strikes ~ 1,
+ tvar = year,
+ gvar = treatment_year,
+ vcov = ~airport,
+ family = "poisson",
+ data = df,
+ offset = ~log_movements)
The variables '.Dtreat:treatment_year::2008:year::2006', '.Dtreat:treatment_year::2008:year::2007' and forty-three others have been removed because of collinearity (see $collin.var).
> emfx(mod)
Error: The function supplied to the `transform_pre` argument must accept two numeric vectors of predicted probabilities of length 0, and return a
single numeric value or a numeric vector of length 0, with no missing value.
In addition: Warning message:
In formals(fun) : argument is not a function
offset()
included in model formula:
mod <- etwfe(fml = strikes ~ offset(log_movements),
tvar = year,
gvar = treatment_year,
vcov = ~airport,
family = "poisson",
data = df)
> mod <- etwfe(fml = strikes ~ offset(log_movements),
+ tvar = year,
+ gvar = treatment_year,
+ vcov = ~airport,
+ family = "poisson",
+ data = df)
Error in str2lang(x) : <text>:1:94: unexpected input
1: strikes ~ .Dtreat : i(treatment_year, i.year, ref = 0, ref2 = 2005) / offset(log_movements)_
fixes::fepois()
and marginaleffects::avg_slopes()
:Here I attempt the approach recommended by grantmcdermott
above.
offset
included in the model:
mod <- fixest::fepois(strikes ~ treat | airport + year, df, se = "cluster")
mod
marginaleffects::avg_slopes(mod)
> mod <- fixest::fepois(strikes ~ treat | airport + year, df, se = "cluster")
> mod
Poisson estimation, Dep. Var.: strikes
Observations: 408
Fixed-effects: airport: 24, year: 17
Standard-errors: Clustered (airport)
Estimate Std. Error t value Pr(>|t|)
treat 0.262385 0.168873 1.55374 0.12025
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Log-Likelihood: -1,892.0 Adj. Pseudo R2: 0.766339
BIC: 4,030.4 Squared Cor.: 0.901086
> marginaleffects::avg_slopes(mod)
Term Contrast Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
treat 1 - 0 7.85 5.74 1.37 0.172 -3.41 19.1
Prediction type: response
Columns: type, term, contrast, estimate, std.error, statistic, p.value, conf.low, conf.high
offset
included as a fepois
option:
mod <- fixest::fepois(strikes ~ treat | airport + year, df, se = "cluster", offset = ~log_movements)
mod
marginaleffects::avg_slopes(mod)
> mod <- fixest::fepois(strikes ~ treat | airport + year, df, se = "cluster", offset = ~log_movements)
> mod
Poisson estimation, Dep. Var.: strikes
Observations: 408
Offset: log_movements
Fixed-effects: airport: 24, year: 17
Standard-errors: Clustered (airport)
Estimate Std. Error t value Pr(>|t|)
treat 0.252704 0.202431 1.24834 0.2119
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Log-Likelihood: -2,038.2 Adj. Pseudo R2: 0.748657
BIC: 4,322.8 Squared Cor.: 0.885169
> marginaleffects::avg_slopes(mod)
Error: The function supplied to the `transform_pre` argument must accept two numeric vectors of predicted probabilities of length 0, and return a
single numeric value or a numeric vector of length 0, with no missing value.
In addition: Warning message:
In formals(fun) : argument is not a function
offset()
included in model formula:
mod <- fixest::fepois(strikes ~ treat + offset(log_movements) | airport + year, df, se = "cluster")
mod
marginaleffects::avg_slopes(mod)
> mod <- fixest::fepois(strikes ~ treat + offset(log_movements) | airport + year, df, se = "cluster")
> mod
Poisson estimation, Dep. Var.: strikes
Observations: 408
Offset: log_movements
Fixed-effects: airport: 24, year: 17
Standard-errors: Clustered (airport)
Estimate Std. Error t value Pr(>|t|)
treat 0.252704 0.202431 1.24834 0.2119
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Log-Likelihood: -2,038.2 Adj. Pseudo R2: 0.748657
BIC: 4,322.8 Squared Cor.: 0.885169
> marginaleffects::avg_slopes(mod)
Term Contrast Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
log_movements dY/dX 0.00e+00 NA NA NA NA NA
treat 1 - 0 9.11e-05 8.26e-05 1.1 0.27 -7.07e-05 0.000253
Prediction type: response
Columns: type, term, contrast, estimate, std.error, statistic, p.value, conf.low, conf.high
fixes::fepois()
and marginaleffects::avg_slopes()
still generates an error if you include the offset as an fepois()
option.offset()
within the model formula, marginaleffects::avg_slopes()
executes.HOWEVER, the marginal effect ATT given by the models with and without the offset term are worryingly different, despite having similar coefficient estimates on treat
.
############### Coefficient Estimate and Marginal Effects via FE Poisson without Offset ###############
Estimate Std. Error t value Pr(>|t|)
treat 0.262385 0.168873 1.55374 0.12025
Term Contrast Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
treat 1 - 0 7.85 5.74 1.37 0.172 -3.41 19.1
############### Coefficient Estimate and Marginal Effects via FE Poisson with Offset ###############
Estimate Std. Error t value Pr(>|t|)
treat 0.252704 0.202431 1.24834 0.2119
Term Contrast Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
log_movements dY/dX 0.00e+00 NA NA NA NA NA treat 1 - 0 9.11e-05 8.26e-05 1.1 0.27 -7.07e-05 0.000253
Does anybody know why the following models
fixest::fepois(strikes ~ treat | airport + year, df, se = "cluster") fixest::fepois(strikes ~ treat + offset(log_movements) | airport + year, df, se = "cluster")
produce nearly identical coefficient estimates for `treat` but marginal effects that are so different?
I have to admit, I'm not sure exactly what is going on with the `offset()` in the calculation of the marginal effects. Maybe I've missed an important step? Of course, the TWFE approach allows one to employ the coefficient estimate on `treat` as ATT without calculating marginal effects (notwithstanding the TWFE research that's been produced over the last 5 years). However, this shows that calculating the ATT via Wooldridge (2022) and `etwfe` might not be so straightforward when employing an offset term for exposure.
All of that being said, a practical solution to my problem is to simply model wildlife strike counts and control for aircraft movements by including it as a traditional covariate. Alternatively, I could calculate wildlife strike rates--i.e., strikes divided by aircraft movements--and use that as my dependent variable in a linear model. Even so, it would be helpful to figure out what's going on here... 🤷‍♂️
Hi @grantmcdermott , I installed marginaleffects
dev version following @vincentarelbundock feedback on Error when using marginaleffects with fepois+offset. Now
data(trade)
gravity_pois_offset = fepois(data = trade, Euros ~ 1 | Origin , offset = ~log(dist_km))
marginaleffects::slopes(gravity_pois_offset)
works. However,
data("mpdta", package = "did")
mpdta$emp = exp(mpdta$lemp)
etwfe(
emp ~ 0, tvar = year, gvar = first.treat, data = mpdta, vcov = ~countyreal,
family = "poisson", offset = ~log(lpop)
) |>
emfx()
Still returns an error
The variables '.Dtreat:first.treat::2006:year::2004', '.Dtreat:first.treat::2006:year::2005' and three others have been removed because of collinearity (see $collin.var).
Error: The function supplied to the `transform_pre` argument must accept two numeric vectors
of predicted probabilities of length 0, and return a single numeric value or a numeric
vector of length 0, with no missing value.
In addition: Warning message:
In formals(fun) : argument is not a function
Almost as if it was using the older version, since this is the same error in Error when using marginaleffects with fepois+offset. I am probably missing something obvious. In terms of loaded packages I have
marginaleffects_0.10.0.9012 etwfe_0.2.0
Cheers,
Mario
P.S. Again, without offset it works
etwfe(
emp ~ 0, tvar = year, gvar = first.treat, data = mpdta, vcov = ~countyreal,
family = "poisson"
) |>
emfx()
The variables '.Dtreat:first.treat::2006:year::2004', '.Dtreat:first.treat::2006:year::2005' and three others have been removed because of collinearity (see $collin.var).
Term Contrast .Dtreat Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
.Dtreat mean(TRUE) - mean(FALSE) TRUE -28.9 20.6 -1.41 0.16 -69.2 11.4
Columns: term, contrast, .Dtreat, estimate, std.error, statistic, p.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo
I’m not exactly sure what the solution is, but I investigated a bit and found a few things.
Important: This uses the Github version of marginaleffects
.
library(did)
library(etwfe)
library(fixest)
library(marginaleffects)
data("mpdta", package = "did")
mpdta$emp <- exp(mpdta$lemp)
mod <- etwfe(
emp ~ 0, tvar = year, gvar = first.treat, data = mpdta, vcov = ~countyreal,
family = "poisson", offset = ~log(lpop))
Internally, marginaleffects
uses the get_modeldata()
function to extract the fit data. Unfortunately, we don’t get the offset column, which means that fixest
breaks when trying to make predictions:
marginaleffects:::get_modeldata(mod) |> head()
# emp .Dtreat year first.treat
# 866 4729 FALSE 2003 2007
# 841 4175 FALSE 2004 2007
# 842 4189 FALSE 2005 2007
# 819 4351 FALSE 2006 2007
# 827 4853 TRUE 2007 2007
# 937 148 FALSE 2003 2007
predict(mod, newdata = marginaleffects:::get_modeldata(mod))
# Error in predict.fixest(mod, newdata = marginaleffects:::get_modeldata(mod)):
# Predict can't be applied to this estimation because the offset (..2) cannot be evaluated for the new data.
# Use a formula for the offset in the first estimation to avoid this.
The reason why we don’t get the offset column, is that insight::find_offset()
does not return useful info:
insight::find_offset(mod)
# [1] "..2"
I suspect that the ..2
comes from etwfe
, but I’m not sure how to ensure that etwfe
passes along all the required columns to make predictions on this kind of model.
Thanks all, bug confirmed.
Interesting it picks up the correct offset directly from the model object.
> mod$offset |> head()
[1] 1.774403 1.774403 1.774403 1.774403 1.774403 0.803067
But not when passing through the call environment.
> mod$call$offset |> head()
Error in x[seq_len(n)] : object of type 'symbol' is not subsettable
> mod$call$offset
..2
(The ..<something>
notation is referencing a fixest macro, which is how it reconciles fomulas in that call environment.)
I don't have time to look right now, but I'll try to troubleshoot and fix later this evening.
P.S. @mariofiorini It won't fix your issue outright, but I see you're using 0.2.0 of etwfe. Please install the latest version 0.3.1 from CRAN, since it aligns some new features with the latest marginaleffects release.
thanks @grantmcdermott . Will update etwfe
. Hope fixing it is not too much work.
In this case, the offset name can be obtained with mod$model_info$offset
:
library(did)
library(etwfe)
library(fixest)
library(marginaleffects)
data("mpdta", package = "did")
mpdta$emp <- exp(mpdta$lemp)
mod <- etwfe(
emp ~ 0, tvar = year, gvar = first.treat, data = mpdta, vcov = ~countyreal,
family = "poisson", offset = ~log(lpop))
#> The variables '.Dtreat:first.treat::2006:year::2004', '.Dtreat:first.treat::2006:year::2005' and three others have been removed because of collinearity (see $collin.var).
mod$model_info$offset
#> [1] "log(lpop)"
Not sure if this is the case for all fixest
objects though, just putting it here if it helps. And this should probably be fixed in insight
.
Thanks @etiennebacher, this is helpful.
I just pushed a commit to insight
which now allows this:
library(did)
library(etwfe)
library(fixest)
packageVersion("insight")
# [1] '0.19.0.9'
data("mpdta", package = "did")
mpdta$emp <- exp(mpdta$lemp)
mod <- feols(lemp ~ 1 | year, offset = ~lpop, data = mpdta)
insight::find_offset(mod)
# [1] "lpop"
mod <- etwfe(
emp ~ 0, tvar = year, gvar = first.treat, data = mpdta, vcov = ~countyreal,
family = "poisson", offset = ~log(lpop))
insight::find_offset(mod)
# [1] "lpop"
However, note that we still can’t extract the offset data itself because etwfe
modifies the mpdta
dataset under the hood before it fits the model, and doesn’t appear to retain the offset column. Maybe etwfe
can use (the dev version of) insight
to figure out what the offset is called and retain it in the data frame it feeds to fixest
.
insight::get_data(mod, additional_variables = TRUE) |> head()
# emp .Dtreat year first.treat
# 866 4729 FALSE 2003 2007
# 841 4175 FALSE 2004 2007
# 842 4189 FALSE 2005 2007
# 819 4351 FALSE 2006 2007
# 827 4853 TRUE 2007 2007
# 937 148 FALSE 2003 2007
Hi all,
Sorry for the slow response on my side. Day job and sick kids keeping me very busy. Needless to say, I appreciate everyone weighing in during my absence.
I think @vincentarelbundock has the gist of it and I can add a catch to the emfx()
internals to pull out the offset if a user provides it. But I think this might worth a patch higher up (i.e., either marginaleffects or insight). Consider the following "manual" implementation of ETWFE. Note that I'm using regular OLS to highlight that the problem isn't related to the model family being Poisson. UPDATE: Oops, sorry mis-copied the offset in my haste last night. It should be lpop
rather than log(lpop)
. With this correction, the manual implementation passes through correctly. Updated code below.
library(fixest)
library(marginaleffects)
data("mpdta", package = "did")
mpdta2 = mpdta |>
transform(
.Dtreat = year >= first.treat & first.treat != 0
)
## Linear with offset --
mod_lin = feols(
lemp ~ .Dtreat:i(first.treat, i.year, ref = 0, ref2 = 2003) |
first.treat + year,
data = mpdta2,
vcov = ~countyreal,
offset = ~lpop
)
#> The variables '.Dtreat:first.treat::2006:year::2004', '.Dtreat:first.treat::2006:year::2005' and three others have been removed because of collinearity (see $collin.var).
avg_slopes(mod_lin, variable = ".Dtreat")
#>
#> Term Contrast Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
#> .Dtreat TRUE - FALSE -0.00555 0.00154 -3.59 <0.001 -0.00858 -0.00253
#>
#> Columns: term, contrast, estimate, std.error, statistic, p.value, conf.low, conf.high
## Poisson with offset --
mod_pois = fepois(
lemp ~ .Dtreat:i(first.treat, i.year, ref = 0, ref2 = 2003) |
first.treat + year,
data = mpdta2,
vcov = ~countyreal,
offset = ~lpop
)
#> The variables '.Dtreat:first.treat::2006:year::2004', '.Dtreat:first.treat::2006:year::2005' and three others have been removed because of collinearity (see $collin.var).
avg_slopes(mod_pois, variable = ".Dtreat")
#>
#> Term Contrast Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
#> .Dtreat TRUE - FALSE -0.00573 0.00163 -3.5 <0.001 -0.00893 -0.00252
#>
#> Columns: term, contrast, estimate, std.error, statistic, p.value, conf.low, conf.high
Created on 2023-03-12 with reprex v2.0.2
P.S. I'm using the dev version of both marginaleffects and insight.
Created on 2023-03-11 with reprex v2.0.2
Ha! I just pushed a fix to insight
that allows ~log(pop)
. Glad to see it working now. But when I try with etwfe
, insight
finds the column in the get_data()
output:
library(etwfe)
data("mpdta", package = "did")
mpdta$emp <- exp(mpdta$lemp)
mod <- etwfe(
emp ~ 0, tvar = year, gvar = first.treat, data = mpdta, vcov = ~countyreal,
family = "poisson", offset = ~log(lpop))
insight::find_offset(mod)
# [1] "lpop"
insight::find_variables(mod)
# $response
# [1] "emp"
#
# $conditional
# [1] ".Dtreat" "year"
#
# $cluster
# [1] "first.treat" "year"
insight::get_data(mod, additional_variables = TRUE) |> head()
# Warning: Could not recover model data from environment. Please make sure your
# data is available in your workspace.
# Trying to retrieve data from the model frame now.
# emp .Dtreat year first.treat
# 866 4729 FALSE 2003 2007
# 841 4175 FALSE 2004 2007
# 842 4189 FALSE 2005 2007
# 819 4351 FALSE 2006 2007
# 827 4853 TRUE 2007 2007
# 937 148 FALSE 2003 2007
Thanks @vincentarelbundock, really appreciate you taking a look on the backend for me :-)
So, this is proving trickier to debug than I first thought. I'm now thinking that the problem isn't with emfx()
per se, but rather that the etwfe
/fixest
call environment is masked from marginaleffects in general. Here's a simple example. Key takeaway is at the end; we can't run avg_slopes(mod)
even though it is (nominally) just a fixest object.
library(etwfe)
library(fixest)
library(marginaleffects)
data("mpdta", package = "did")
## etwfe version
mod = etwfe(
lemp ~ 1, tvar=year, gvar=first.treat, data=mpdta,
vcov = ~countyreal,
offset = ~lpop
)
## "manual" version
mod2 = feols(
lemp ~ .Dtreat:i(first.treat, i.year, ref = 0, ref2 = 2003) |
first.treat + year,
data = mpdta |>
transform(
.Dtreat = year >= first.treat & first.treat != 0
),
vcov = ~countyreal,
offset = ~lpop
)
#> The variables '.Dtreat:first.treat::2006:year::2004', '.Dtreat:first.treat::2006:year::2005' and three others have been removed because of collinearity (see $collin.var).
## same coef result for both models
all.equal(mod$coeftable, mod2$coeftable)
#> [1] TRUE
## not true for the call
all.equal(mod$call, mod2$call)
#> [1] "target, current do not match when deparsed"
## avg_slopes() only works directly on the manual version...
avg_slopes(mod, variable = ".Dtreat")
#> Warning in formals(fun): argument is not a function
#> Error: The function supplied to the `comparison` argument must accept two
#> numeric vectors of predicted probabilities of length 0, and return a
#> single numeric value or a numeric vector of length 0, with no missing
#> value.
avg_slopes(mod2, variable = ".Dtreat")
#>
#> Term Contrast Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
#> .Dtreat TRUE - FALSE -0.00555 0.00154 -3.59 <0.001 -0.00858 -0.00253
#>
#> Columns: term, contrast, estimate, std.error, statistic, p.value, conf.low, conf.high
Created on 2023-03-12 with reprex v2.0.2
A hack might be to store the data as an attribute, and then emfx
would pass the data explicitly to the newdata
argument in slopes
. We only need access to the environment if we have to guess where the data is, not when the user feeds it to us explicitly.
A hack might be to store the data as an attribute, and then emfx would pass the data explicitly to the newdata argument in slopes. We only need access to the environment if we have to guess where the data is, not when the user feeds it to us explicitly.
True. I've resisted doing this because I don't want to create large model objects. But maybe it isn't a practical constraint.
Another option might be passing envir = mod$call_env
through ...
in avg_slopes()
and co. (With an implicit assumption that this gets passed further up to insight::get_data
?) I have to start prepping dinner, but will mull on this a bit more and see if I have a reasonable implementation for a PR.
Hi all, I just pushed a commit to this repo (https://github.com/grantmcdermott/etwfe/commit/f095ae57670f271e8f7f272d8f74166821baf1b4) as well as a PR to the insight repo (https://github.com/easystats/insight/pull/759) that fixes the offset error.
In order to test, please make sure that you grab the development version of both packages.
install.packages(
c("etwfe", "insight"),
repos = c("https://grantmcdermott.r-universe.dev", "https://easystats.r-universe.dev")
)
Here's a MWE example with these latest updates.
library(etwfe)
data("mpdta", package = "did")
mpdta$emp = exp(mpdta$lemp)
omod = etwfe(
lemp ~ 1, tvar=year, gvar=first.treat, data=mpdta,
vcov = ~countyreal,, family = "poisson",
offset = ~lpop
)
#> The variables '.Dtreat:first.treat::2006:year::2004', '.Dtreat:first.treat::2006:year::2005' and three others have been removed because of collinearity (see $collin.var).
omod |> emfx()
#>
#> Term Contrast .Dtreat Estimate Std. Error z Pr(>|z|)
#> .Dtreat mean(TRUE) - mean(FALSE) TRUE -0.0492 0.014 -3.5 <0.001
#> 2.5 % 97.5 %
#> -0.0767 -0.0217
#>
#> Columns: term, contrast, .Dtreat, estimate, std.error, statistic, p.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo
Created on 2023-04-29 with reprex v2.0.2
Note for posterity: I don't think that adding lpop
as an offset to this particular model will affect the response variable because lpop is constant within the level of the fixed effects (in particular, first.treat
). In other words, you will get the same result with and without this offset. I'd appreciate it if someone could test with their own data and report back.
@mariofiorini @levialtringer Do you two mind testing the updated offset behaviour with your own datasets? (As per my previous comment, just make sure to grab the dev version of both insight and etwfe.)
I'd like to submit an update to CRAN as soon as possible, partly because of an unrelated bug fix for heterogeneous ATTs. So it would be good to confirm that this offset behaviour is working too.
Hi @grantmcdermott , I can confirm that the update seems to have fixed the problem. FYI, I have also tested it on simple simulated data against the imputation method of Borusyak, Jaravel, and Spiess "Revisiting Event Study Designs: Robust and Efficient Estimation" (R&R Review of Economic Studies) and they are numerically equivalent. The advantage of etwfe is that you get the standard error directly, especially in the non-linear case. Happy to make an R markdown version of the comparison if you think it's useful.
Huzzah, thanks for confirming @mariofiorini. I can't promise that I'll include the Rmd as a formal vignette (though I might). But I'd love to see it all the same. Please ping me when it's ready.
P.S. These updates are in v0.3.2 of etwfe, which has just been released on CRAN.
(Not sure when insight will be updated, but they adopt a pretty frequent release cycle. So my guess is soon.)
Hi Scott, thanks for the great package. I am using etwfe in a Poisson regression with an exposure variable. etwfe() works but emfx() returns an error message. Example using mpdta data and the lpop variable as the exposure.
It looks like a problem with the
marginaleffects
package. Let me know if there is a simple workaround or if I am mistaken.