case weights - Githubissues

topepo commented 3 years ago

It would be great to have the calculations for the curve take into account cases weights (i.e. a non-negative, numeric vector of values the same length as the other data objects).

xrobin commented 3 years ago

I agree this would be cool. Do you have a reference on how this is implemented in the context of ROC curves?

topepo commented 3 years ago

The curve would be based on the weighted versions of sensitivity and specificity.

library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#>   method                   from   
#>   required_pkgs.model_spec parsnip

data(pathology)
str(pathology)
#> 'data.frame':    344 obs. of  2 variables:
#>  $ pathology: Factor w/ 2 levels "abnorm","norm": 1 1 1 1 1 1 1 1 1 1 ...
#>  $ scan     : Factor w/ 2 levels "abnorm","norm": 1 1 1 1 1 1 1 1 1 1 ...

set.seed(1)
pathology$weights <- runif(nrow(pathology))

event <- "abnorm"

unweighted <- 
  sum(pathology$pathology == event & pathology$scan == event) /
  sum(pathology$pathology == event)
unweighted
#> [1] 0.8953488

# via yardstick:
sensitivity(pathology, pathology, scan)
#> # A tibble: 1 × 3
#>   .metric .estimator .estimate
#>   <chr>   <chr>          <dbl>
#> 1 sens    binary         0.895

weighted <- 
  sum( pathology$weights * (pathology$pathology == event & pathology$scan == event) ) /
  sum( pathology$weights * (pathology$pathology == event) )

weighted
#> [1] 0.9013333

^{Created on 2021-09-13 by the reprex package (v2.0.0)}

@davisvaughan has the start of changes that we will be making to yardstick here

xrobin commented 3 years ago

I think I see. The easiest would be to directly update the roc.utils.perfs.all.fast to calculate TP/FP taking the weights into account:

  tp <- cumsum(response.sorted==1 * weights.sorted)
  fp <- cumsum(response.sorted==0 * weights.sorted)

A few thought on the implementation:

The number of cases and controls might become fractional because of this change. I'm not sure what side-effects this could have.
There's a C++ algorithm that will need to be updated too. It's a loop so it should be quite straightforward. Alternatively it could be a good time to get rid of alternative algorithms and simplify the code.
It will be necessary to modify the roc objects and store the weights there, so that bootstrap functions re-use the weights appropriately.
At this point I'm not sure how much changes will be required in those bootstrapping functions. They've needed major refactoring for a long time but I never found the time to do so.
Issue #70 will get in the way. There's quite a lot of redundancy as pROC has several functions that build ROC curves under the hood (ie auc, ci, etc), which will have to be updated.

aminadibi commented 4 months ago

I'd love this feature too. WeightedROC package does it, but that package doesn't produce CIs.

xrobin / pROC

case weights #96