Closed K-Maehashi closed 2 years ago
Dear @K-Maehashi, thanks for the interest in our package. The package already supports clustered standard errors in the BLP and GATES regressions. You can specify this option via the setup_vcov()
function. Below an example.
Concerning your next question:
(if we do a block randomization by clusters when we make splits, does it become a clustered SE?)
It's hard to make a generally valid statement about whether or not to adjust standard errors for clustering. I recommend this working paper by Abadie et al. (2017) for a detailed discussion, perhaps it is of help for you.
# please make sure to have the latest version of GenericML installed
library("GenericML")
## generate data
set.seed(1)
n <- 200 # number of observations
p <- 5 # number of covariates
D <- rbinom(n, 1, 0.5) # random treatment assignment
Z <- matrix(runif(n*p), n, p) # design matrix
Y0 <- as.numeric(Z %*% rexp(p) + rnorm(n)) # potential outcome without treatment
Y1 <- 2 + Y0 # potential outcome under treatment
Y <- ifelse(D == 1, Y1, Y0) # observed outcome
# randomly sample cluster membership
cluster <- sample(1:5, n, replace = TRUE)
# specify cluster-robust standard errors via vcovCL() in 'sandwich'
# and pass the clusters as arguments
vcov_BLP <- setup_vcov(estimator = "vcovCL",
arguments = list(cluster = cluster))
vcov_GATES <- vcov_BLP # same for GATES
# run GenericML (few splits to keep computation time low)
x <- GenericML(Z, D, Y, learners_GenericML = "lasso", num_splits = 10,
vcov_BLP = vcov_BLP, vcov_GATES = vcov_GATES, parallel = FALSE)
@mwelz Fantastic! I should have read the README file more carefully. Thank you so much for the detailed explanation. (This package works pretty well with ~20,000 obs. and I like the visualization plots! Somehow it eats a lot of memory but this package has everything I need!)
@K-Maehashi Thanks for the kind feedback! Memory consumption can indeed be an issue, in particular for large datasets. Here are a few tips to save memory in GenericML()
:
parallel = TRUE
), choosing a lower number of cores via num_cores
may save memory (at the expense of longer computation time);store_learners = FALSE
;prop_aux
to a value smaller than the default of 0.5. This assigns a smaller number of observations to the auxiliary set. Since the memory-intensive estimation of the proxy learners takes place on this set, perhaps a smaller auxiliary set might save memory.We might look into making GenericML()
more memory-efficient in a future release; we have not yet optimized it for that.
This issue hasn't seen activity in the last eight days, so I'll close it now. Please feel free to re-open if you think this issue hasn't been solved properly.
Hello @mwelz, thanks for this great package! I have a feature request--if this package has a clustered SE option, it will be wonderful. (if we do a block randomization by clusters when we make splits, does it become a clustered SE?)