Possibility to calibrate a pre-fitted estimator

ip200 / venn-abers

Python implementation of binary and multi-class Venn-ABERS calibration

MIT License

134 stars 12 forks source link

Possibility to calibrate a pre-fitted estimator #1

Closed tuvelofstrom closed 1 year ago

tuvelofstrom commented 1 year ago

I really welcome this initiative!

I am considering using your implementation in a package but I need to be able to use it on already fitted estimators. I could not find any way of applying it to already fitted estimators, which is a must in my case.

The optimal would be if, at line 352 (VennAbersCV.fit), a check is made if the estimator is already fitted, assuming that the calibration data submitted for fitting has not been used for training. I would be happy to discuss my use case with you to find a good solution!

tuvelofstrom commented 1 year ago

An obvious alternative would be to add an additional parameter when initializing the VennAbersCalibrator, indicating whether the estimator is pre-fitted or not.

tuvelofstrom commented 1 year ago

A quick and dirty solution could be the following (I have tested and it works as intended):

# add some imports
from sklearn.utils.validation import check_is_fitted
from sklearn.exceptions import NotFittedError

# exchange rows 341-356 with the following
        if self.inductive:
            self.n_splits = 1
            try:
                check_is_fitted(self.estimator)
                x_cal, y_cal = _x_train, _y_train
            except NotFittedError:
                x_train_proper, x_cal, y_train_proper, y_cal = train_test_split(
                    _x_train,
                    _y_train,
                    test_size=self.cal_size,
                    train_size=self.train_proper_size,
                    random_state=self.random_state,
                    shuffle=self.shuffle,
                    stratify=self.stratify
                )
                self.estimator.fit(x_train_proper, y_train_proper.flatten())
            clf_prob = self.estimator.predict_proba(x_cal)
            self.clf_p_cal.append(clf_prob)
            self.clf_y_cal.append(y_cal)
        else:

tuvelofstrom commented 1 year ago

As I commented in issue #2, I realized I do not need these changes, so I will close these issues. But I still think these improvements would be valuable, so consider re-opening the issues again.

ip200 commented 1 year ago

Hi Tuwe, thank you once again for your suggestions. These have been incorporated now and are available here as well as on PyPi (soon on conda-forge too). Thanks, Ivan