Closed marvinquiet closed 4 years ago
Hi, thanks for the report!
I don't see how ties.idx could possibly be empty. Could you attach the data that triggers this problem? It would help me find out what's going on exactly and hence fix it.
This piece of code is pretty tricky. It's here to handle cases where you can't represent the mean of two numbers due to the limited precision of computer numbers. But this precision so really high, it's extremely unlikely to happen in the first place. In any case it's quite important to select the right threshold to be numerically accurate, so I'll need a test case to be able to do it right.
As a quick workaround you could try to add a very small jitter to your data
predictors <- jitter(predictors, factor=1e-12)
which will break the near-ties. Make sure to adjust the factor low enough in order to not affect your data in a meaningful way. Here I used 1e-12, I doubt you have numbers down to that precision.
Hi, Thank you for your prompt reply!
Attached please find the data. There are two variables in the data named values and labels, which I used the labels as a response while values as predictors. pROC_test.RData.gz
At first, I thought it was caused by when tie.idx=1, then tie.idx-1=0, however, R index starts at 1.
> if (thresholds[1] == unique.candidates[0]) {print("test")}
Error in if (thresholds[1] == unique.candidates[0]) { :
argument is of length zero
> length(thresholds)
[1] 18819
> length(unique.candidates)
[1] 18818
It seems that adding jitters does not work for this tied problem. Please let me know if anything else I could help.
I can see that you have a -Inf
value in values
. Indeed jitter is not going to help.
range(values)
[1] -Inf 26.96443
Infinite values are generally disallowed in ROC curve. The reason is that a ROC curve must test all thresholds from -Inf to +Inf. It is therefore difficult to compare your -Inf value with the -Inf threshold.
Although it is possible to compute -Inf <= -Inf
in R, when supplied with an infinite value, most packages may generate an "invalid" ROC curve that may not hit the points (0,0) or (1, 1), or worse generate an inaccurate ROC curve. This will wreak havoc in particular on the AUC calculations and is generally undesirable. In order to avoid that pROC rejects inputs containing infinite values.
At this point I don't know why it pROC didn't display an error message for your data. I will investigate that.
Regarding your analysis, you should probably remove the infinite value from your data like you would remove a missing value.
Yes, I guess that's why it generates two ties there. I can definitely remove the -Inf and try it again! Thanks so much for your quick reply and support!
Describe the bug A clear and concise description of what the bug is.
Line 121: if (thresholds[tie.idx] == unique.candidates[tie.idx - 1]) {
When tie.idx = 1, this statement will throw an error "argument is of length zero".To Reproduce Steps to reproduce the behavior:
sessionInfo()
and report the output.Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
Random number generation: RNG: Mersenne-Twister Normal: Inversion Sample: Rounding
locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages: [1] stats4 parallel tools stats graphics grDevices utils datasets methods base
other attached packages: [1] data.table_1.12.8 fgsea_1.12.0 Rcpp_1.0.4.6 EnhancedVolcano_1.4.0 RColorBrewer_1.1-2
[6] monocle3_0.2.1 SingleCellExperiment_1.8.0 SummarizedExperiment_1.16.1 DelayedArray_0.12.3 BiocParallel_1.20.1
[11] matrixStats_0.56.0 GenomicRanges_1.38.0 GenomeInfoDb_1.22.1 IRanges_2.20.2 S4Vectors_0.24.4
[16] Biobase_2.46.0 BiocGenerics_0.32.0 pROC_1.16.2 forcats_0.5.0 stringr_1.4.0
[21] purrr_0.3.4 readr_1.3.1 tidyr_1.1.0 tibble_3.0.1 tidyverse_1.3.0
[26] dichromat_2.0-0 ggrepel_0.8.2 reshape2_1.4.4 gplots_3.0.3 ggplot2_3.3.0
[31] dplyr_0.8.5
loaded via a namespace (and not attached): [1] colorspace_1.4-1 deldir_0.1-25 ellipsis_0.3.1 class_7.3-17 XVector_0.26.0 fs_1.4.1
[7] rstudioapi_0.11 proxy_0.4-24 farver_2.0.3 RSpectra_0.16-0 fansi_0.4.1 lubridate_1.7.8
[13] xml2_1.3.2 splines_3.6.3 codetools_0.2-16 jsonlite_1.6.1 broom_0.5.6 dbplyr_1.4.3
[19] pheatmap_1.0.12 uwot_0.1.8 BiocManager_1.30.10 compiler_3.6.3 httr_1.4.1 backports_1.1.7
[25] assertthat_0.2.1 Matrix_1.2-18 cli_2.0.2 igraph_1.2.5 coda_0.19-3 gtable_0.3.0
[31] glue_1.4.1 GenomeInfoDbData_1.2.2 RANN_2.6.1 gmodels_2.18.1 fastmatch_1.1-0 slam_0.1-47
[37] cellranger_1.1.0 raster_3.1-5 vctrs_0.3.0 spdep_1.1-3 gdata_2.18.0 nlme_3.1-148
[43] DelayedMatrixStats_1.8.0 rvest_0.3.5 lifecycle_0.2.0 irlba_2.3.3 gtools_3.8.2 LearnBayes_2.15.1
[49] MASS_7.3-51.6 zlibbioc_1.32.0 scales_1.1.1 hms_0.5.3 expm_0.999-4 leidenbase_0.1.0
[55] gridExtra_2.3 stringi_1.4.6 e1071_1.7-3 caTools_1.18.0 boot_1.3-25 spData_0.3.5
[61] rlang_0.4.6 pkgconfig_2.0.3 bitops_1.0-6 lattice_0.20-41 sf_0.9-3 labeling_0.3
[67] tidyselect_1.1.0 RcppAnnoy_0.0.16 plyr_1.8.6 magrittr_1.5 R6_2.4.1 generics_0.0.2
[73] DBI_1.1.0 pillar_1.4.4 haven_2.3.0 withr_2.2.0 units_0.6-6 RCurl_1.98-1.2
[79] sp_1.4-2 modelr_0.1.8 crayon_1.3.4 KernSmooth_2.23-17 viridis_0.5.1 grid_3.6.3
[85] readxl_1.3.1 reprex_0.3.0 digest_0.6.25 classInt_0.4-3 pbmcapply_1.5.0 munsell_0.5.0
[91] viridisLite_0.3.0
pROC_obj <- roc(labels, predictors, direction=c("<")) coords(pROC_obj, ret = c("tpr", "fpr"), transpose=FALSE) # this causes the error