Closed etiennebacher closed 1 year ago
Thanks! I was putting off doing that b/c I think I know how to speed that up by even more; but actual progress is much better than theoretical progress!
Fair point on having better tests. At this point I know the point estimates on the df_hom
estimator, so I don't personally need them, but that's not sustainable. I'll add some tests here in a bit
Thanks for the inspiration, I sped it up some more (and much lower memory) 96da875. Tests added too 475b749
It is far too easy to accidentally make a sparse matrix dense... 🙃
library(data.table)
library(did2s)
foo <- list()
for (i in 1:2500) {
g <- paste0("Group ", sample(1:3, 1))
start_rel <- sample(c((-20):(-1), Inf), 1)
if (is.infinite(start_rel)) {
rel_year <- rep(Inf, 31)
} else {
rel_year <- seq(from = start_rel, by = 1, length.out = 31)
}
foo[[i]] <- data.frame(
unit = rep(i, 31),
state = rep(i + 10000, 31),
group = rep(g, 31),
rel_year = rel_year,
dep_var = rnorm(31),
year = 1990:2020
)
foo[[i]]$treat <- as.numeric(!is.infinite(foo[[i]]$rel_year) & foo[[i]]$rel_year > 0)
}
dat <- data.table::rbindlist(foo)
bench::mark(
did2s = did2s(dat,
yname = "dep_var", first_stage = ~ 0 | state + year,
second_stage = ~ i(treat, ref = FALSE), treatment = "treat",
cluster_var = "state"
)
)
# Your fixes:
# expression min median `itr/sec` mem_alloc
# did2s 3.88s 3.88s 0.257 4.04GB
# Update:
# expression min median `itr/sec` mem_alloc
# did2s 2.8s 2.8s 0.357 438MB
Wow, what an improvement in memory alloc, nice!
Hello, thanks a lot for this package! I saw your tweet on the large performance improvement you made and I thought maybe there were more to do.
I noticed that some lines in
make_g()
are called several times while once would be enough. Calling these lines only once can lead to a ~1.5x speedup in some cases. Below is a benchmark with a made-up dataset that should be similar todf_het
(but larger). The function call is the same as in the README.I don't know your preferences regarding the DESCRIPTION and NEWS so either let me know or feel free to commit directly in this branch.
Benchmark
Setup
Before
After
As a sidenote, I'm not super familiar with the package or with the code inside so it's nice to have tests to be sure I didn't mess up anything. I think it would be even better to have some checks on numerical results (even just the first coefficient) to avoid changes that don't error but modify the results.