s3alfisc / fwildclusterboot

Fast Wild Cluster Bootstrap Inference for Regression Models / OLS in R. Additionally, R port to WildBootTests.jl via the JuliaConnectoR.
https://s3alfisc.github.io/fwildclusterboot/
GNU General Public License v3.0
23 stars 4 forks source link

Error: cannot allocate vector of size 19.6 Gb #27

Closed ccmullally closed 2 years ago

ccmullally commented 2 years ago

I ran into something else. I'm using a dataframe with about 60,000 observations and six clusters. I am running into the error above when running boottest. Boottest on Stata handles everything with no issues using the same data set. I am also able to do the wild cluster bootstrap "by hand" using a foreach loop. I'm happy to share my data and complete code if that would help with the diagnosis. Here is the offending code:

lm.fit <- lm(price ~  CAXpost + CA + post, data = df, weights = units)

# bootstrap inference 
boot_feols <- boottest(lm.fit, clustid = "market", param = "CAXpost", B = 499, type = 'webb')
s3alfisc commented 2 years ago

Hi @ccmullally,

Thanks for the feedback! I'm sorry about your problems, and also a little surprised to hear that you run out of memory with six clusters - the main constraint on memory is in creating the bootstrap weights matrix v, which is of dimension G x(B+1) , which I would not have expected to lead to memory issues when G = 6. How many bootstrap iterations are you running? Can you confirm that the memory problem arises when creating the weights matrix? The weights are created in boot_algo2(), with the following lines

    v <- wild_draw_fun(n = N_G_bootcluster * (boot_iter + 1))
    dim(v) <- c(N_G_bootcluster, boot_iter + 1)
    v[, 1] <- 1

Anyways, this is a problem I am aware of and it is good that you mention it, so it is now back on my to-do list :)

For a quick fix from 'within' R, you could also try wildboottestjlr, which is a wrapper around @droodman's WildBootTests.jl - it follows the same syntax as fwildclusterboot::boottest(), but you would have to install Julia (e.g. via the wildboottestjlr_setup() function).

EDIT The error occurs probably in the matrix multiplication part, not in the creation of the weights matrix v.

ccmullally commented 2 years ago

I was running 499 replications. I will admit that I am a total R rookie (I'm moving my grad class problem sets from Stata to R) I don't know how to isolate the part of the code where it is breaking. Is there an equivalent to Stata's "trace" in R?

s3alfisc commented 2 years ago

I think the R equivalent should be the traceback() function, see this tutorial here. In your case, the debug() function might be useful too. But maybe it would be easiest if you sent me your data set and code? My email is alexander-fischer1801@t-online.de. Btw, very cool that you are helping your students to learn R! :)

s3alfisc commented 2 years ago

Hi, I have now checked your code & data and you have indeed found a bug!

In a nutshell, the error arose due to the use of column labels - I have updated the development version so that the error no longer arises. Can you confirm that the bootstrap now runs without any troubles? :)

In around 30-60 minutes, you should be able to installl a compiled version of fwildclusterboot from r-universe by running

# from r-universe (windows & mac, compiled R > 4.0 required)
install.packages('fwildclusterboot', repos ='https://s3alfisc.r-universe.dev')

I will submit the package to CRAN after Jan 3rd when the CRAN team is back from their winter break.