harrelfe / Hmisc

Harrell Miscellaneous
Other
204 stars 81 forks source link

cut2 throws error when input vector and number of quantile groups is large #169

Closed isaacvock closed 9 months ago

isaacvock commented 12 months ago

Hi,

cut2 throws the following error when provided with a large vector and g is set to a large number (see reproducible example for vague sense of what "large" means):

Error in if (cj == upper) next : missing value where TRUE/FALSE needed
In addition: Warning message:
In (1:g) * nnm : NAs produced by integer overflow

Reproducible example (I am using Hmisc version 5.0.1, R version 4.3.0, and ran this code in Rstudio on a laptop with 32GB of RAM and running Windows 11):

library(Hmisc)

# Vector to pass to cut2
test_vector <- rnorm(n = 500000)

# Attempt to run cut2 (should throw error listed above)
test_bins <- cut2(test_vector, g = 5000)

# cut2 does not throw an error for me with lower value of g
test_bins <- cut2(test_vector, g = 4000)

# cut2 does not throw an error for the large g that previously errored if input vector is smaller
test_bins <- cut2(sample(test_vector, 10000), g = 5000)

I did not see anything in the documentation or previous error reports about this problem, but I apologize if I missed any such posts or if my issue is a case of user error. I am also happy to provide any additional information.

Best, Isaac

couthcommander commented 12 months ago

I see the same behaviour and have submitted a potential fix.