s3alfisc / fwildclusterboot

Fast Wild Cluster Bootstrap Inference for Regression Models / OLS in R. Additionally, R port to WildBootTests.jl via the JuliaConnectoR.
https://s3alfisc.github.io/fwildclusterboot/
GNU General Public License v3.0
24 stars 4 forks source link

Vector Memory Exhausted in R-lean #137

Closed michaeltopper1 closed 1 year ago

michaeltopper1 commented 1 year ago

Hi @s3alfisc,

I'm having trouble getting the boottest function to work with my data set. My best guess is that this is a memory problem given the large amounts of observations I have (~4 million) and high dimensional fixed effects (time = 2536, group = 22). Plus, I keep getting same error: Error: vector memory exhausted (limit reached?)

I have tried using the argument engine = "R-lean" although I still get the vector memory exhausted error.

Moreover, I also tried out the summclust package, but it looks like I'm running into a similar problem there.

I would put in a reprex, although I don't know how helpful that would be here.

Is there a solution around this problem? My only thought is to run this on a server somewhere, but I wanted to stop by here in case there is an easier way.

s3alfisc commented 1 year ago

Hi @michaeltopper1 , thanks for reporting!

Some quick questions:

Actually, I think this should be doable?

R-lean should help if your memory error is caused by a bootstrap weights matrix that gets too large - the weights matrix is of dimension G x (B+1). The advantage of the R-lean algo is that it does not allocate the entire weights matrix prior to the bootstrap, but only afterwards.

Ad hoc, I have two ideas what you could try:

I will be super busy tomorrow, so cannot promise I'll be able to help, but will try to take a closer look on Wednesday & respond here =)

Best, Alex

s3alfisc commented 1 year ago

Also, in case you are not already doing it, you should definitely set fe = time, which should help a lot?

s3alfisc commented 1 year ago

Sorry, of course I'd summclust also fails with the same error, the it cannot be the weights 😅

michaeltopper1 commented 1 year ago

Thanks for the quick response!

To answer the Q's:

Here's an output of my regression. Sorry for the corny screen shot,

Screenshot 2023-08-28 at 3 13 57 PM

And the corresponding fwildclusterboot code:

boottest(entry_d, clustid = c("district"), B = 999 , param = "treatment", engine = "R-lean")

I'm guessing boottest sticks all of these matrices into RAM? If that's the case, then I think I'll be able to solve this by getting on a server.

michaeltopper1 commented 1 year ago

Ok your final comment on setting fe = "date" helped A TON. It now runs fast. Sorry about that! I should've read a little more into the documentation on this note. I'll close this now.

Thanks for the help!

s3alfisc commented 1 year ago

Awesome, great that you could get your bootstrap to run! No need to apologize, I am always super happy to hear from users of my packages, and to see that they are used in actual research projects! =)