xrobin / pROC

Display and analyze ROC curves in R and S+
https://cran.r-project.org/web/packages/pROC/
GNU General Public License v3.0
121 stars 31 forks source link

coordinates when smoothing #14

Closed topepo closed 8 years ago

topepo commented 8 years ago

I think that there is an issue when using coords and a smoothed curve. The format of results are different between smooth and unsmoothed curves and I suspect that the threshold is being returned in place of the specificity when smoothing is used.

For example:

library(pROC)

data(aSAH)

roc_orig <- roc(aSAH$outcome, aSAH$s100b)
roc_smooth <- roc(aSAH$outcome, aSAH$s100b, smooth = TRUE)

## plots are not extremely different
plot(roc(aSAH$outcome, aSAH$s100b, smooth = TRUE))
plot(roc(aSAH$outcome, aSAH$s100b), add = TRUE, col = "red")

coord_orig <- t(coords(roc_orig, seq(0, 1, 0.01)))
coord_smooth <- t(coords(roc_smooth, seq(0, 1, 0.01)))
coord_smooth2 <- t(coords(smooth(roc_orig), seq(0, 1, 0.01)))

The results are very different:

> head(coord_orig)
     threshold specificity sensitivity
0         0.00  0.00000000   1.0000000
0.01      0.01  0.00000000   1.0000000
0.02      0.02  0.00000000   1.0000000
0.03      0.03  0.00000000   1.0000000
0.04      0.04  0.00000000   0.9756098
0.05      0.05  0.06944444   0.9756098
> head(coord_smooth)
     specificity sensitivity
0           0.00   1.0000000
0.01        0.01   0.9970265
0.02        0.02   0.9942254
0.03        0.03   0.9914151
0.04        0.04   0.9885741
0.05        0.05   0.9856905

Thanks,

Max

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.5 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] pROC_1.8

loaded via a namespace (and not attached):
[1] plyr_1.8.4  tools_3.3.1 Rcpp_0.12.5
xrobin commented 8 years ago

The difference is that a smoothed ROC curve has no threshold. I realize that the doc of smooth.roc is wrong.

The result is that coords has input="specificity" and ret=c("specificity", "sensitivity") for a smooth.roc (instead of input="threshold" and ret=c("threshold", "specificity", "sensitivity")), which is probably the difference you are seeing.

To convince yourself that it is correct, you can overlay the curve predicted by coords to the curves you've just plotted:

lines(coord_smooth[,"specificity"], coord_smooth2[,"sensitivity"], col="white", lty=2, lwd = 2)