Closed frederickluser closed 1 year ago
Super, thanks Frederic. I've been too busy to give this my full attention, but I'm hoping to get around to it later this week. Speeding up the emfx
calculations is something I'm definitely interested in adding. Two quick thoughts that come to mind (without have looked at the code!)
collapse = "auto"
.Excited to take a proper look.
Dear Grant,
the aggregation trick works also with non-linear models, I tried a poisson.
I also added an collapse = "auto"
option and tested it, but my fork is now at conflict with the marginaleffects
changes.
Best, Frederic
@frederickluser We've got some conflicts that need to be resolved due to the compatibility updates with marginaleffects 0.9.0 (#20). Feel free to have a crack at these, otherwise I'll try as soon as I get a chance. Goal is to submit to CRAN by the end of the day (Pacific Time).
Hi @frederickluser I've resolved most of the conflicts, but there are bunch of test failures on the test_emfx_xvar.R
file that I'm unable to resolve.
Consider the simple_known
object and what I get for the (nominally equivalent) function call of emfx(x1, xvar = "xvar")
. I'm focusing on the estimate values to abstract from some of the other things that need to be fixed (e.g., ordering by xvar and some minor printing changes that stem from the marginaleffects update).
> simple_known[, c("estimate", "xvar")]
estimate xvar
1 -0.004358929 1
2 -0.077394771 2
3 -0.121365807 3
> as.data.frame(emfx(x3, xvar = "xvar"))[, c("estimate", "xvar")] |> dplyr::arrange(xvar)
estimate xvar
1 -0.01221530 1
2 -0.06043789 2
3 -0.09087307 3
Do you know why we're getting different estimates here? I'm using the same seed as the original file (set.seed(123)
), so I don't think it can be that.
Can you please check against some known output on your side. I'm worried that the test results are changing through these different PRs...
Hey Grant. I messed up the check somehow, sorry.
The problem was that the collapse_data="auto"
now switched to collapse_data=FALSE
for this small sample data. But the simple_known
was ran with collapse_data=T
.
I checked now both cases and the results are the same as before. For the simplest solution for the test_emfx_xvar.R function, I just overwrote the default to collapse_data = T
.
The difference you've shown before is just the marginal difference between collapsing and using the full data. I hope that solves the problem.
Excellent. Thanks for fixing and for this nice pull request!
Heads-up that I've tweaked these functionality a bit (e.g. in https://github.com/grantmcdermott/etwfe/commit/ea3bb3e0e8ac534d6f3613b792eba2392b27e28a). Key user-level changes are:
collapse
instead of collapse_data
.
Hey Grant,
I added now the option
collapse_data
toemfx
, which will significantly improve running time for large data sets.If
collapse_data = T
, then the data will be collapsed usingdata.table
before computing marginal effects and then weighted by group-cohort sizes in themarginaleffects
command itself. The results are (almost) identical to the case without collapsing. Ifivar
is not NULL inetwfe,
collapsing is not possible and a warning message appears.I updated the tinytests and the checks produced no errors. Do you think this would be a sensible merge? You're welcome to request changes.
Best, Frederic