Closed xrobin closed 5 years ago
The function is mostly vectorized now, except the part that matches the user-supplied thresholds to ours. One difficulty is that we can't rely on the threshold values themselves and must look at the predictors instead. This step still relies on a call to sapply
and is basically as slow as before.
The thresholds are now determined with a for and while loop in something like O(n). Ugly but very efficient.
> system.time(coords(r, "a"))
utilisateur système écoulé
0.236 0.020 0.257
Keeping the issue open to ponder whether it is worth updating the function with input="se" and input="sp". Those still call match repeatedly, however it is unclear whether that will be a problem as we never go along all of them systematically like with x="all".
Skipping the step through numeric thresholds, and skipping the calculation of extra coordinates when only the existing se/sp/thr are requested, saves even more time.
> system.time(coords(r, "a"))
utilisateur système écoulé
0.003 0.000 0.003
coords
is now fairly competitive, and takes < 5% the time it takes to create a very large ROC curve.
> response <- rbinom(1E7, 1, .5)
> predictor <- rnorm(1E7)
> system.time(r <- roc(response, predictor))
Setting levels: control = 0, case = 1
Setting direction: controls < cases
utilisateur système écoulé
11.403 2.146 13.543
> system.time(coords(r, "a"))
utilisateur système écoulé
0.328 0.216 0.544
I would expect it to complete more or less instantly.