Error: "Requested size is too large"

kylebutts / did2s

Two-stage Difference-in-Differences package following Gardner (2021)

http://kylebutts.github.io/did2s

Other

96 stars 22 forks source link

Error: "Requested size is too large" #7

Closed adebiasi21 closed 2 years ago

adebiasi21 commented 3 years ago

I am trying to run a standard static model:

static <- did2s(mydata0, 
                yname = "crime", first_stage = ~ 0 | parcelid + time, 
                second_stage = ~ post, treatment = "post", 
                cluster_var = "parcelid")

However, I get the following error:

Error in make V(x1, x10, x2) : SpMat::init(): requested size is too large; suggest to enable ARMA_64BIT_WORD

3. stop(structure(list(message = "SpMat::init(): requested size is too large; suggest to enable ARMA_64BIT_WORD", call = make_V(x1, x10, x2), cppstack = NULL), class = c("std::logic_error", "C++Error", "error", "condition")))
2. make_V(x1, x10, x2)
1. did2s(mydata0, yname = "crime", first_stage = ~0 | parcelid + time, second_stage = ~post, treatment = "post", cluster_var = "parcelid")

On its face, it seems like my dataset is too large. Any advice on how to address this error would be appreciated.

kylebutts commented 3 years ago

Yeah, it's almost surely that the dataset is too large to compute analytic standard errors. I would try things in this order:

Bootstrap standard options through the did2s function. See ?did2s for details. This might not work though as the code could admittedly be more efficient.
Do manual block-bootstrap: sample with replacement at the level of the cluster_var. Then compute the first_stage statistic (you can use https://github.com/kylebutts/did2s/blob/23a96e673a31e1f0aaa62dad089783961f24e9b6/R/did2s.R#L248).

Do it as a for loop and be careful not use too much memory and it will work

adebiasi21 commented 3 years ago

Thanks, Kyle. I'll try these workarounds. Would running this analysis "as is" on a cluster be another option? Under those conditions, memory wouldn't be an issue. Or, am I not understanding the crux of this issue?

kylebutts commented 3 years ago

That is certainly an option too! Yeah, the problem is that make_V can't store a matrix big enough for the fixed effects even in sparse form. So more memory can fix that problem

I do think it's worth considering if you want to include parcel fixed effects. A slightly higher level of aggregation (e.g. street fixed effects) are theoretically more appealing anyways as a parcel fixed effect is not consistent with fixed T

adebiasi21 commented 3 years ago

Awesome! I'll try my hand at running the analysis on a cluster before trying the other workarounds. And, thanks for the advice on the parcel fixed effects. The variable name is a bit deceiving, as they are actually street networks associated with each parcel in my dataset. "networkid" would likely be a better name.

apodges commented 3 years ago

To update: I tried running the analysis "as is" on a cluster. However, I get the following error: address (nil), cause 'memory not mapped'. I am wondering whether there is some built-in stopping rule within the code that prevents the analysis from proceeding past a certain matrix size. Alternatively, this could be a cluster-specific issue. That said, do you have any insights here?