mcaceresb / stata-gtools

Faster implementation of Stata's collapse, reshape, xtile, egen, isid, and more using C plugins
https://gtools.readthedocs.io
MIT License
182 stars 38 forks source link

gstat winsor segfaults when the upper cut is 100 and the data is not sorted #51

Closed sergiocorreia closed 5 years ago

sergiocorreia commented 5 years ago

If the data is unsorted, winsorizing below (e.g. with cut(10 100)) can cause a segfault.

Also related, if one of the cuts is missing, I'm not sure what's the output. EG: cut(10 .)

Code sample:

* Create data
clear
input int x long(y z)
 1 357 554
 2 400 955
 3 689 895
 4 559 956
 5 574 503
 6 207 619
 7  28 321
 8 688 123
 9 469 445
10 207 554
11   3  18
12  13 912
13 420 776
14 616 616
15 894 132
16 410 647
17 260 730
18  26 446
19 107   7
20 366 664
end

gstats winsor x, suffix(_1) cut(0 80) // ok
gstats winsor x, suffix(_2) cut(20 100) // ok
gstats winsor x, suffix(_3) cut(20 .) // wrong?
gstats winsor x, suffix(_4) cut(0 100) // ok

sort y

gstats winsor x, suffix(_5) cut(0 100) // segfault
gstats winsor x, suffix(_6) cut(20 100) // segfault

Version info

mcaceresb commented 5 years ago

@sergiocorreia Let me know if the fix in develop is working for you.

sergiocorreia commented 5 years ago

Seems to be working now, thanks!