xrobin / pROC

Display and analyze ROC curves in R and S+
https://cran.r-project.org/web/packages/pROC/
GNU General Public License v3.0
118 stars 31 forks source link

implemented a more efficient calculation of the delong placements #16

Closed sieste closed 7 years ago

sieste commented 7 years ago

Hi there,

The current implementation of Delongs method in delong.cpp required to loop over all pairs of cases and controls, so computational complexity is O(N^2). I implemented a different algorithm to calculate the Delong placements based on the order statistics of cases and controls, which runs in O(NlogN). For small sample sizes of around 10 the new algorithm is slightly slower than the old one that loops over all pairs. But for large sample sizes (>100) there is a massive improvement in run time. The algorithm I implemented is similar to the one proposed in "Sun and Xu: Fast Implementation of DeLongs Algorithm for Comparing the Areas Under Correlated Receiver Operating Characteristic Curves. IEEE Signal Processing Letters, 21(11):1389-1393, 2014."

I ran some tests to verify that the output of old and new code is the same. Needless to say that more testing is always good.

Maybe this is helpful for future releases.

Cheers, Stefan

xrobin commented 7 years ago

Thanks for your contribution! I have added testthat tests for this function. In particular, I want to see what happens with roc curves that have direction = ">" and percent=TRUE. The current code is 3 years old and I don't remember how I handled those cases. I pushed the tests to master, and I will try them with your code as soon as I can.

sieste commented 7 years ago

Excellent. Let me know if I can be of help.

xrobin commented 7 years ago

I added more tests, and a static target data (the current results of the function, which I consider correct). A very quick check seems to indicate that your version passes the tests as well. I just want to mess up a bit and see them fail, just to make sure they are valid at all.

xrobin commented 7 years ago

A few more tests showed your code works as expected. I merged the request, did a bit of clean-up and added some doc. Please let me know if you see anything wrong. Thanks again for your contribution!

xrobin commented 7 years ago

I did some speed tests too. We had an edge case: with large datasets, low number of thresholds and algorithm = 3, bootstrap used to be faster. This is not the case any longer and I updated the doc accordingly.

library(pROC)
n <- 200000
a <- as.numeric(cut(rnorm(n), c(-Inf, -1, 0, 1, Inf)))
b <- round(runif(n))
r <- roc(b, a, algorithm = 3)

# With Bootstrap
> system.time(var(r, method = "b", progress = "none"))
utilisateur     système      écoulé 
     25.896       0.136      26.027 

# With old DeLong algorithm
> system.time(var(r, method = "d"))
utilisateur     système      écoulé 
     47.352       0.008      47.353 

# With new DeLong algorithm
> system.time(var(r, method = "d"))
utilisateur     système      écoulé 
      0.016       0.008       0.023 
sieste commented 7 years ago

I'm glad everything works. The speedup above is quite impressive. At very small sample sizes the old algorithm might be faster though.

Thanks for the mention in the package description.