ip200 / venn-abers

Python implementation of binary and multi-class Venn-ABERS calibration
MIT License
114 stars 11 forks source link

Possibility to calibrate a pre-fitted estimator #1

Closed tuvelofstrom closed 12 months ago

tuvelofstrom commented 12 months ago

I really welcome this initiative!

I am considering using your implementation in a package but I need to be able to use it on already fitted estimators. I could not find any way of applying it to already fitted estimators, which is a must in my case.

The optimal would be if, at line 352 (VennAbersCV.fit), a check is made if the estimator is already fitted, assuming that the calibration data submitted for fitting has not been used for training. I would be happy to discuss my use case with you to find a good solution!

tuvelofstrom commented 12 months ago

An obvious alternative would be to add an additional parameter when initializing the VennAbersCalibrator, indicating whether the estimator is pre-fitted or not.

tuvelofstrom commented 12 months ago

A quick and dirty solution could be the following (I have tested and it works as intended):

# add some imports
from sklearn.utils.validation import check_is_fitted
from sklearn.exceptions import NotFittedError
# exchange rows 341-356 with the following
        if self.inductive:
            self.n_splits = 1
            try:
                check_is_fitted(self.estimator)
                x_cal, y_cal = _x_train, _y_train
            except NotFittedError:
                x_train_proper, x_cal, y_train_proper, y_cal = train_test_split(
                    _x_train,
                    _y_train,
                    test_size=self.cal_size,
                    train_size=self.train_proper_size,
                    random_state=self.random_state,
                    shuffle=self.shuffle,
                    stratify=self.stratify
                )
                self.estimator.fit(x_train_proper, y_train_proper.flatten())
            clf_prob = self.estimator.predict_proba(x_cal)
            self.clf_p_cal.append(clf_prob)
            self.clf_y_cal.append(y_cal)
        else:
tuvelofstrom commented 12 months ago

As I commented in issue #2, I realized I do not need these changes, so I will close these issues. But I still think these improvements would be valuable, so consider re-opening the issues again.

ip200 commented 11 months ago

Hi Tuwe, thank you once again for your suggestions. These have been incorporated now and are available here as well as on PyPi (soon on conda-forge too). Thanks, Ivan