Closed sieste closed 7 years ago
Thanks for your contribution!
I have added testthat tests for this function. In particular, I want to see what happens with roc curves that have direction = ">"
and percent=TRUE
. The current code is 3 years old and I don't remember how I handled those cases.
I pushed the tests to master, and I will try them with your code as soon as I can.
Excellent. Let me know if I can be of help.
I added more tests, and a static target data (the current results of the function, which I consider correct). A very quick check seems to indicate that your version passes the tests as well. I just want to mess up a bit and see them fail, just to make sure they are valid at all.
A few more tests showed your code works as expected. I merged the request, did a bit of clean-up and added some doc. Please let me know if you see anything wrong. Thanks again for your contribution!
I did some speed tests too. We had an edge case: with large datasets, low number of thresholds and algorithm = 3
, bootstrap used to be faster. This is not the case any longer and I updated the doc accordingly.
library(pROC)
n <- 200000
a <- as.numeric(cut(rnorm(n), c(-Inf, -1, 0, 1, Inf)))
b <- round(runif(n))
r <- roc(b, a, algorithm = 3)
# With Bootstrap
> system.time(var(r, method = "b", progress = "none"))
utilisateur système écoulé
25.896 0.136 26.027
# With old DeLong algorithm
> system.time(var(r, method = "d"))
utilisateur système écoulé
47.352 0.008 47.353
# With new DeLong algorithm
> system.time(var(r, method = "d"))
utilisateur système écoulé
0.016 0.008 0.023
I'm glad everything works. The speedup above is quite impressive. At very small sample sizes the old algorithm might be faster though.
Thanks for the mention in the package description.
Hi there,
The current implementation of Delongs method in delong.cpp required to loop over all pairs of cases and controls, so computational complexity is O(N^2). I implemented a different algorithm to calculate the Delong placements based on the order statistics of cases and controls, which runs in O(NlogN). For small sample sizes of around 10 the new algorithm is slightly slower than the old one that loops over all pairs. But for large sample sizes (>100) there is a massive improvement in run time. The algorithm I implemented is similar to the one proposed in "Sun and Xu: Fast Implementation of DeLongs Algorithm for Comparing the Areas Under Correlated Receiver Operating Characteristic Curves. IEEE Signal Processing Letters, 21(11):1389-1393, 2014."
I ran some tests to verify that the output of old and new code is the same. Needless to say that more testing is always good.
Maybe this is helpful for future releases.
Cheers, Stefan