xrobin / pROC

Display and analyze ROC curves in R and S+
https://cran.r-project.org/web/packages/pROC/
GNU General Public License v3.0
121 stars 31 forks source link

coords is too slow with many thresholds #52

Closed xrobin closed 5 years ago

xrobin commented 5 years ago
response <- rbinom(1E5, 1, .5)
predictor <- rnorm(1E5)
r <- roc(response, predictor)
system.time(coords(r, "a"))
utilisateur     système      écoulé 
     47.791       0.088      47.867 

I would expect it to complete more or less instantly.

xrobin commented 5 years ago

The function is mostly vectorized now, except the part that matches the user-supplied thresholds to ours. One difficulty is that we can't rely on the threshold values themselves and must look at the predictors instead. This step still relies on a call to sapply and is basically as slow as before.

xrobin commented 5 years ago

The thresholds are now determined with a for and while loop in something like O(n). Ugly but very efficient.

> system.time(coords(r, "a"))
utilisateur     système      écoulé 
      0.236       0.020       0.257 

Keeping the issue open to ponder whether it is worth updating the function with input="se" and input="sp". Those still call match repeatedly, however it is unclear whether that will be a problem as we never go along all of them systematically like with x="all".

xrobin commented 5 years ago

Skipping the step through numeric thresholds, and skipping the calculation of extra coordinates when only the existing se/sp/thr are requested, saves even more time.

> system.time(coords(r, "a"))
utilisateur     système      écoulé 
      0.003       0.000       0.003 

coords is now fairly competitive, and takes < 5% the time it takes to create a very large ROC curve.

> response <- rbinom(1E7, 1, .5)
> predictor <- rnorm(1E7)
> system.time(r <- roc(response, predictor))
Setting levels: control = 0, case = 1
Setting direction: controls < cases
utilisateur     système      écoulé 
     11.403       2.146      13.543 
> system.time(coords(r, "a"))
utilisateur     système      écoulé 
      0.328       0.216       0.544