In def _get_last_included_proba(), y_pred_index_last was not updated when np.sum(zeros_scores_proba_last) > 0 (line - 857), leading to incorrect reference to the index of y_pred_proba_last. Consequently, in def _add_random_tie_breaking(), y_pred_proba_last will not be properly updated (line-539), delivering incorrect prediction sets.
Compare two experiments MapieClassifier(method = 'aps', include_last_label = "randomized") vs. MapieClassifier(method = 'aps', include_last_label = True): The "randomized" setting generates more True labels than the True setting.
Proposed solution
Starting from line-852:
zeros_scores_proba_last = (y_pred_proba_last <= EPSILON)
# If the last included proba is zero, change it to the
# smallest non-zero value to avoid inluding them in the
# prediction sets.
if np.sum(zeros_scores_proba_last) > 0:
y_pred_proba_last[zeros_scores_proba_last] = np.expand_dims(
np.min(
np.ma.masked_less(
y_pred_proba,
EPSILON
).filled(fill_value=np.inf),
axis=1
), axis=1
)[zeros_scores_proba_last]
y_pred_index_last[zeros_scores_proba_last] = np.expand_dims(
np.argmin(
np.ma.masked_less(
y_pred_proba,
EPSILON
).filled(fill_value=np.inf),
axis=1
), axis=1
)[zeros_scores_proba_last]
return y_pred_proba_cumsum, y_pred_index_last, y_pred_proba_last
Describe the bug
File 'classification.py'
In
def _get_last_included_proba()
,y_pred_index_last
was not updated whennp.sum(zeros_scores_proba_last) > 0
(line - 857), leading to incorrect reference to the index ofy_pred_proba_last
. Consequently, indef _add_random_tie_breaking()
,y_pred_proba_last
will not be properly updated (line-539), delivering incorrect prediction sets.Compare two experiments
MapieClassifier(method = 'aps', include_last_label = "randomized")
vs.MapieClassifier(method = 'aps', include_last_label = True)
: The"randomized"
setting generates more True labels than theTrue
setting.Proposed solution Starting from line-852: