Open ethanharvey98 opened 2 years ago
Every feature we include has a maintenance cost. Our maintainers are mostly volunteers. For a new feature to be included, we need evidence that it is often useful and, ideally, well-established in the literature or in practice.
Can you provide a reference on this method for averaging ROC curves?
This paper, published in 2006 in the Pattern Recognition Letters (with over 20,000 citations), discusses the method implemented in the code above. I would love to contribute by writing this feature. Please let me know if rewriting the function in vanilla python would be helpful.
The paper in Section 8 describes two modes methods averaging: vertical averaging and threshold averaging. In both cases, the visualization in Figure 9 shows the uncertainty: (c) is vertical and (d) is threshold
Before we considering adding this into the ROCCurveDisplay
, we need to design an API for a function that returns the average and the uncertainty. For me, the uncertainty is important to show given that these curves were computed through cross validation. Currently, the only function that returns some uncertainty information permutation_importance. Most of the work for this feature is coming up with the API. Here are the options I see:
roc_curve
accepts a cv
splitter and a average
parameter to switch between the two modes of averaging (if we want both averaging modes). Like permutation_importance
it will return all the curves, the means, and the uncertainty. The downside of this approach is that it inflates the API of roc_curve
.roc_curve_cv
that has the same API as above, but is only used for average. The downside is that this adds another function.Or could roc_curve
be designed to accept a list of y_trues
and y_scores
instead of a cv
splitter (if an average
parameter is passed)? This would reduce the amount inflation while still providing the same function.
Or could roc_curve be designed to accept a list of y_trues and y_scores instead of a cv splitter
I have three issues with this API:
roc_curve
accepts a list for y_scores
, it would overlap with how we output a list of ndarrays for multilabel problems:from sklearn.datasets import make_multilabel_classification
from sklearn.tree import DecisionTreeClassifier
X, y = make_multilabel_classification(random_state=0)
tree = DecisionTreeClassifier()
tree.fit(X, y)
print(type(tree.predict_proba(X)))
# <class 'list'>
Although the format is a little different from the one used for averaging ROC curves, I think it will end up to be confusing.
A user would need to know how to use the splitter API to compute scores and pass it in as a list of y_true
and y_scores
. (We likely can work around this by extending cross_validate
to output predictions, but that is a different topic: https://github.com/scikit-learn/scikit-learn/issues/17075)
What I meant by inflating the API is that the return type depends on the input's type. Currently, roc_curve
always returns a tuple of three arrays. I think it is poor API design for roc_curve
to accept a list of ndarrays as input and the output changes to an average roc curve + uncertainty.
For me, I prefer a new function all together. (Option 2 in https://github.com/scikit-learn/scikit-learn/issues/23983#issuecomment-1193109238) The scope of the new function would be limited to doing cross validation for computing an average roc curve.
That sounds good. I created a branch on my GitHub for this feature (see branch). Would that be the best way to move forward?
Having an average ROC curve computed using cross validation would be fairly new API-wise to scikit-learn. To move forward we likely need:
@glemaitre @ogrisel Would you be interested in having this feature in scikit-learn?
Plotting uncertainty or confidence intervals is not something straightforward and I think that there is a benefit to doing so for our users.
I think that our displays should be able to be fed with the results of cross_validate
. I recall experimenting with the following: https://github.com/scikit-learn/scikit-learn/pull/21211. I would think of something similar for the ROC display.
That sounds great. Would the function take results from cross_validate
instead of y_true
and y_score
(like in #21211)?
def roc_curve_cv(
cv_results, X, y, *, average='vertical', pos_label=None, sample_weight=None, drop_intermediate=True
):
Should the function have a sample rate? In the paper (cited in #23983 (comment)), the curves were sampled at FP rates from 0 through 1 by 0.1. Should the function use a similar sample rate, or should a sample rate only be introduced when the average ROC curve is graphed?
Describe the workflow you want to enable
When using k-fold cross-validation the resulting ROC curves can vary in length if there are a different number of positive and/or negative samples in each fold. I would like to add an option to sklearn.metrics.RocCurveDisplay to display the average of different length ROC curves.
Section 8 of An introduction to ROC analysis by Fawcett describes the method.
Describe your proposed solution
Describe alternatives you've considered, if relevant
No response
Additional context
No response