matthieugomez / sumup

summarize by groups in Stata
MIT License
26 stars 6 forks source link

Overflow bug #5

Closed sergiocorreia closed 9 years ago

sergiocorreia commented 9 years ago

When running the command with a large dataset, I got a hard-to-replicate bug

. sumup contribuyente , by(cat_contribuyente )
Obs. nos. out of range
r(198);

After some tracing, it seems this line is the culprit:

bys `touse' `by' : gen `bylength' = _N 

If the dataset is too large, you have rounding errors and you may end up doing with values outside the range of the dataset.

If you set the type to long (or double), the issue would be fixed

Also, note that even if double would work, in practice we rarely have datasets above 2BN obs, so a line like this may be useful:

local type = cond(c(N)>c(maxlong), "double", "long")
bys `touse' `by' : gen `type' `bylength' = _N 

Best, S

matthieugomez commented 9 years ago

Very helpful, thanks.