scikit-learn-contrib / MAPIE

A scikit-learn-compatible module to estimate prediction intervals and control risks based on conformal predictions.
https://mapie.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
1.2k stars 99 forks source link

MapieClassifier(method = 'aps', include_last_label = "randomized") generates incorrect results #456

Open qzheng60 opened 1 month ago

qzheng60 commented 1 month ago

Describe the bug

File 'classification.py'

In def _get_last_included_proba(), y_pred_index_last was not updated when np.sum(zeros_scores_proba_last) > 0 (line - 857), leading to incorrect reference to the index of y_pred_proba_last. Consequently, in def _add_random_tie_breaking(), y_pred_proba_last will not be properly updated (line-539), delivering incorrect prediction sets.

Compare two experiments MapieClassifier(method = 'aps', include_last_label = "randomized") vs. MapieClassifier(method = 'aps', include_last_label = True): The "randomized" setting generates more True labels than the True setting.

Proposed solution Starting from line-852:

        zeros_scores_proba_last = (y_pred_proba_last <= EPSILON)

        # If the last included proba is zero, change it to the
        # smallest non-zero value to avoid inluding them in the
        # prediction sets.
        if np.sum(zeros_scores_proba_last) > 0:
            y_pred_proba_last[zeros_scores_proba_last] = np.expand_dims(
                np.min(
                    np.ma.masked_less(
                        y_pred_proba,
                        EPSILON
                    ).filled(fill_value=np.inf),
                    axis=1
                ), axis=1
            )[zeros_scores_proba_last]

            y_pred_index_last[zeros_scores_proba_last] = np.expand_dims(
                np.argmin(
                    np.ma.masked_less(
                        y_pred_proba,
                        EPSILON
                    ).filled(fill_value=np.inf),
                    axis=1
                ), axis=1
            )[zeros_scores_proba_last]

        return y_pred_proba_cumsum, y_pred_index_last, y_pred_proba_last
thibaultcordier commented 1 month ago

Thank you @qzheng60 for reporting this problem! We'll take a closer look. We'll let you know soon.