Open topepo opened 3 years ago
I agree this would be cool. Do you have a reference on how this is implemented in the context of ROC curves?
The curve would be based on the weighted versions of sensitivity and specificity.
library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#> method from
#> required_pkgs.model_spec parsnip
data(pathology)
str(pathology)
#> 'data.frame': 344 obs. of 2 variables:
#> $ pathology: Factor w/ 2 levels "abnorm","norm": 1 1 1 1 1 1 1 1 1 1 ...
#> $ scan : Factor w/ 2 levels "abnorm","norm": 1 1 1 1 1 1 1 1 1 1 ...
set.seed(1)
pathology$weights <- runif(nrow(pathology))
event <- "abnorm"
unweighted <-
sum(pathology$pathology == event & pathology$scan == event) /
sum(pathology$pathology == event)
unweighted
#> [1] 0.8953488
# via yardstick:
sensitivity(pathology, pathology, scan)
#> # A tibble: 1 × 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 sens binary 0.895
weighted <-
sum( pathology$weights * (pathology$pathology == event & pathology$scan == event) ) /
sum( pathology$weights * (pathology$pathology == event) )
weighted
#> [1] 0.9013333
Created on 2021-09-13 by the reprex package (v2.0.0)
@davisvaughan has the start of changes that we will be making to yardstick
here
I think I see. The easiest would be to directly update the roc.utils.perfs.all.fast to calculate TP/FP taking the weights into account:
tp <- cumsum(response.sorted==1 * weights.sorted)
fp <- cumsum(response.sorted==0 * weights.sorted)
A few thought on the implementation:
roc
objects and store the weights there, so that bootstrap functions re-use the weights appropriately.auc
, ci
, etc), which will have to be updated.I'd love this feature too. WeightedROC package does it, but that package doesn't produce CIs.
It would be great to have the calculations for the curve take into account cases weights (i.e. a non-negative, numeric vector of values the same length as the other data objects).