When a bin has samples of the same probability

Hi,

I just came across an interesting corner case: some bins have samples of the same probability.

The code below will reproduce the error.

import calibration as cal

model_probs = [[0.5507, 0.4493], 
 [0.8764, 0.1236],
 [0.1822, 0.8178],
 [0.3814, 0.6186],
 [0.9725, 0.0275],
 [0.281,  0.719 ],
 [0.8817, 0.1183],
 [0.8193, 0.1807],
 [0.4806, 0.5194],
 [0.9415, 0.0585],
 [0.4648, 0.5352],
 [0.9561, 0.0439]]
labels = [0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0]

calibrator = cal.PlattBinnerMarginalCalibrator(len(labels), num_bins=4)
calibrator.train_calibration(model_probs, labels)
print (calibrator._bins)

The shape of the first row in calibrator._bins is (3,) instead of (4,) as expected.

We looked into the reason and found that the last two bins have samples of the same probabilities.

We are wondering whether in such a case, an error message should be thrown out or the probabilities should have been added with noises.

p-lambda / verified_calibration

When a bin has samples of the same probability #19