harrelfe / Hmisc

Harrell Miscellaneous
Other
204 stars 81 forks source link

`describe` changes the values with nonsensical roundings #175

Closed bixiou closed 10 months ago

bixiou commented 10 months ago

Running describe(c(0, -0.1, 1)) returns

c(0, -0.1, 1) 
       n  missing distinct     Info     Mean      Gmd 
       3        0        3        1      0.3   0.7333 

Value      -0.100 -0.001  1.000
Frequency       1      1      1
Proportion  0.333  0.333  0.333

For the frequency table, variable is rounded to the nearest 0.011

It makes no sense that 0 because -0.001 in the output. This flawed behavior seems to occur as soon as there are some floats (decimal numbers).

It is the first time I notice this bug. I use Hmisc version 5.1.0, R version 4.3.1 and RStudio version 2023.08.0-daily+53.

couthcommander commented 10 months ago

I can confirm this was a change introduced in Hmisc 5.1.0, resulting from Hmisc:::describe.vector's call to spikecomp:

spikecomp(c(0, -0.1, 1), tresult = 'roundeddata')$x

I don't know the intended behaviour for spikecomp, so have no suggestions on a fix. You could try 5.0.1 (archived) or 5.1.1 (not on CRAN).

bixiou commented 10 months ago

Ok. It is a bug I can live with for now, especially if the bug is corrected in the next CRAN release.

couthcommander commented 10 months ago

Okay, I understand what's happening. spikecomp is used to group (bin) "close enough" values together. This example is a little clearer:

describe(10^-seq(5))

It looks like 5.1-1 already has a fix, so I will close.