s3alfisc / fwildclusterboot

Fast Wild Cluster Bootstrap Inference for Regression Models / OLS in R. Additionally, R port to WildBootTests.jl via the JuliaConnectoR.
https://s3alfisc.github.io/fwildclusterboot/
GNU General Public License v3.0
23 stars 4 forks source link

Suggestion: use qtab() for crosstab() #117

Open SebKrantz opened 1 year ago

SebKrantz commented 1 year ago

qtab() introduced in collapse 1.8.0 should be more efficient than the workaround with fsum(). You need to pass the data to the weights argument w of qtab(). In any case, even if that should not be the case, fsum() now has an argument fill = TRUE, which you can set to avoid the res[is.na(res)] <- 0 line.

s3alfisc commented 1 year ago

Oh, that's super cool! Thanks for pointing me to it. I will move to qtab() with the next release. Thanks! =)

s3alfisc commented 1 year ago

Yes, it indeed gives good speed up:

library(microbenchmark)

N <- 100000
a <- sample(1:20, N, replace = TRUE)
b <- sample(1:300, N, replace = TRUE)
y <- matrix(rnorm(N), N, 1)
data <- y
var1 <- data.frame(a = a)
var2 <- data.frame(b = b)

microbenchmark(
ct1 <- fwildclusterboot:::crosstab(data = data, var1 = var1, var2 = var2),
ct4 <- fwildclusterboot:::crosstab4(data = data, var1 = var1, var2 = var2),
ct5 <- fwildclusterboot:::crosstab_qtab(data = data, var1 = var1, var2 = var2),
times = 1
)

# min         lq       mean     median         uq        max
# 5.353802   5.353802   5.353802   5.353802   5.353802
# 225.850701 225.850701 225.850701 225.850701 225.850701
# 3.371901   3.371901   3.371901   3.371901   3.371901

Nice! :)