kylebutts / did2s

Two-stage Difference-in-Differences package following Gardner (2021)
http://kylebutts.github.io/did2s
Other
96 stars 22 forks source link

Enable large Armadillo matrices #17

Closed SebKrantz closed 2 years ago

SebKrantz commented 2 years ago

I got an error in a large data setting, the error is:

Running Two-stage Difference-in-Differences
• first stage formula `~ 0 | id + t`
• second stage formula `~ paved`
• The indicator variable that denotes when treatment is on is `paved`
• Standard errors will be clustered by `id`
Error in make_V(x1, x10, x2) : 
  SpMat::init(): requested size is too large; suggest to enable ARMA_64BIT_WORD
Called from: make_V(x1, x10, x2)

Adding PKG_CPPFLAGS = -DARMA_64BIT_WORD=1 apparently fixes it. See:

https://stackoverflow.com/questions/40592054/large-matrices-in-rcpparmadillo-via-the-arma-64bit-word-define

kylebutts commented 2 years ago

Awesome! Thanks Sebastian. Love collapse by the way! Just to confirm, this fixed your issue with big data? I think I've tried this before with someone else and it didn't fix it.

SebKrantz commented 2 years ago

Thanks @kylebutts, it certainly runs pretty long in my case, but I am still able to terminate the run without terminating R. Since this is an Armadillo option which needs to be set at compile time i.e. cannot be changed by the R user, it is better to enable it and have users deal with the consequences of their code running too long.

kylebutts commented 2 years ago

Thanks, great! As for running with super large datasets, running bootstrapped standard errors is possible as well (description of why analytic standard errors slow things down a lot https://github.com/kylebutts/did2s/issues/12#issuecomment-1133131488).

You can also, of course get point estimates and slightly too small errors by running the two-stages manually as you work on it and then bootstrap the standard errors when the code is running properly!