marius-cp / calibrationband

Honest calibration assessment for binary outcome predictions
https://marius-cp.github.io/calibrationband/
11 stars 0 forks source link

Computation crashes for some extreme values #1

Open martinmodrak opened 5 months ago

martinmodrak commented 5 months ago

Hi, not sure if this is actively mantained, and I get if not. I am trying to use the package for a little project and I have a little observation - for some extreme values the computation crashes. The smallest example I have is:

calibrationband::calibration_bands(c(1-1.22e-15,1-6.66e-16), c(1,1))

This gives:

Error in tibble::tibble(x = x, lwr = lwr, upr = upr) : 
  Tibble columns must have compatible sizes.
• Size 4: Existing data.
• Size 3: Column `lwr`.
ℹ Only values of size one are recycled.
marius-cp commented 5 months ago

Hi Martin,

Thanks for working with our package and reporting this error message. After looking into it briefly, I believe it stems from the precision limitations of floating-point representation in R. Concretely, the error stems from differences in the behavior of the unique and split functions in R. We use these two functions to aggregate (and obtain) the identical values of x.

To illustrate, consider the following example: x = c(1 - 1.22e-15, 1 - 6.66e-16) y = c(1, 1) split(y, x) unique(x)

In this case, the split function does not recognize the small differences between the values in x, whereas the unique function does. This discrepancy causes the tibble error.

One option to address this issue is to use sprintf with high precision to format the values before using split. For instance, one can modify the relevant part of the calibration_bands.R file to use split(y, sprintf("%.50f", x)). However, I am not entirely sure whether this is a sustainable solution... Perspectively, I should try to use something consistent to aggregate the identical values of x.

Thanks again!

Best, Marius

martinmodrak commented 4 months ago

My current workaround is to just round the x values to a high precision, i.e. calibration_band(round(prob, 7), outcome). I'll see if I can quickly understand your code well enough to offer a better alternative.