mcaceresb / stata-gtools

Faster implementation of Stata's collapse, reshape, xtile, egen, isid, and more using C plugins
https://gtools.readthedocs.io
MIT License
182 stars 38 forks source link

`gcollapse` fails with a large number of targets or by variables #7

Closed mcaceresb closed 7 years ago

mcaceresb commented 7 years ago

gcollapse will give an error when there are too many by variables or targets. The number of targets and by variables are limited by matsize:

clear
set matsize 100
set obs 10
forvalues i = 1/101 {
    gen x`i' = 10
}
gen zz = runiform()
gcollapse zz, by(x*)
gcollapse x*, by(zz)

Both commands above will fail with error code 908. However, there is a point where increasing matsize will not help with the number of targets:

clear
set matsize 400
set obs 10
forvalues i = 1/300 {
    gen x`i' = 10
}
gen zz = runiform()
gcollapse zz, by(x*)
gcollapse x*, by(zz)

The first command will succeed but the second will fail with error code 3000 (too many tokens). This is a problem with lines 253-255, 314, 385, 487, 514, 570, 579, and 621 using the regular subinstr function rather than the extended macro function :subinstr. A previous commit had switched to using :subinstr for all locals, but these lines use the function to create a mata object.

NOTE: The matsize problem may be a very fundamental limitation. Make sure to create a warning if it cannot be bypassed.

mcaceresb commented 7 years ago

The above commands will no longer fail. This is now only a problem when Stata hits the matsize limit. So this is fine:

clear
set obs 10
forvalues i = 1/800 {
    gen x`i' = 10
}
gen zz = runiform()
preserve
    gcollapse zz, by(x*) `options'
restore, preserve
    gcollapse x*, by(zz) `options'
restore

But this fails

gen x801 = 10
preserve
    collapse zz, by(x*) `options'
restore, preserve
    collapse x*, by(zz) `options'
restore

However, the error message specifies it is a matsize limitation and tells the user how they might be able to fix it.