xrobin / pROC

Display and analyze ROC curves in R and S+
https://cran.r-project.org/web/packages/pROC/
GNU General Public License v3.0
118 stars 31 forks source link

What does "direction" mean in roc function #125

Closed Ivy-ops closed 2 months ago

Ivy-ops commented 7 months ago

Hi developer, I am trying to use roc() function with my dataset; after reading the description of the "direction", I still can not understand what does this mean. It would be highly appreciated if you can help me with this: I use random forest and get the probability of each sample(shown below), the second column is for "Case" group. My dataset rf$prediction: Control Case [1,] 0.24642643 0.7535736 [2,] 0.33507026 0.6649297 [3,] 0.45731121 0.5426888 [4,] 0.46547831 0.5345217 [5,] 0.53042247 0.4695775 [6,] 0.31020475 0.6897952 [7,] 0.15786178 0.8421382 [8,] 0.15340136 0.8465986 [9,] 0.15774135 0.8422587 [10,] 0.18421489 0.8157851 [11,] 0.64663338 0.3533666 [12,] 0.40697185 0.5930282 [13,] 0.37198661 0.6280134 [14,] 0.57076432 0.4292357 [15,] 0.18086131 0.8191387 [16,] 0.58201416 0.4179858 [17,] 0.19227444 0.8077256 [18,] 0.46165459 0.5383454 [19,] 0.19301864 0.8069814 [20,] 0.66767106 0.3323289 [21,] 0.80801017 0.1919898 [22,] 0.66952125 0.3304788 [23,] 0.62995097 0.3700490 [24,] 0.50042121 0.4995788 [25,] 0.77477208 0.2252279 [26,] 0.60949394 0.3905061 [27,] 0.82625698 0.1737430 [28,] 0.65935287 0.3406471 [29,] 0.07350427 0.9264957 [30,] 0.72550278 0.2744972 [31,] 0.72104726 0.2789527 [32,] 0.65799964 0.3420004 [33,] 0.70231445 0.2976856 [34,] 0.32174162 0.6782584 [35,] 0.86845567 0.1315443 [36,] 0.50935250 0.4906475 [37,] 0.44772867 0.5522713 [38,] 0.78675787 0.2132421

actual [1] Case Case Case Case Case Case Case Case Case Case Case Case Case Case Case Case Case Case Case Control [21] Control Control Control Control Control Control Control Control Control Control Control Control Control Control Control Control Control Control Levels: Case Control

table(actual, predict) predict actual Case Control Case 15 4 Control 3 16

Then I use roc function:

pROC::roc(actual, rf$predictions[,2], levels = c('Case','Control'), plot=T, direction = '>') Call: roc.default(response = actual, predictor = rf$predictions[, 2], levels = c("Case", "Control"), direction = ">", plot = T) Data: rf$predictions[, 2] in 19 controls (actual Case) > 19 cases (actual Control). Area under the curve: 0.8726 pROC::roc(actual, rf$predictions[,2], levels = c('Case','Control'), plot=T, direction = '<') Call: roc.default(response = actual, predictor = rf$predictions[, 2], levels = c("Case", "Control"), direction = "<", plot = T) Data: rf$predictions[, 2] in 19 controls (actual Case) < 19 cases (actual Control). Area under the curve: 0.1274

As we can see in the above code, I can have 2 different AUCs. I refer to the tutorial of roc() and https://stackoverflow.com/questions/31756682/what-does-coercing-the-direction-argument-input-in-roc-function-package-proc that mentioned about direction means probability < |> the threshold.

Does direction mean: when I calculate the 1st sample, if I use threshold=0.5 and direction ">", direction means 0.7535736> 0.5, sample 1 will be predicted as "Case"? If I use threshold = 0.5 and direction "<", what does direction mean? Too confused. When to use ">" and when to use "<"? Looking forward to your help! Much appreciated!

xrobin commented 7 months ago

Thanks for your report.

I'm not sure what's unclear exactly. What do you suggest should be clarified precisely, and can you maybe make some suggestions of better ways to explain that?

Ivy-ops commented 7 months ago

Hi @xrobin , Thanks for the reply. Based on the tutorial:

">”: if the predictor values for the control group are higher than the values of the case group (controls > t >= cases) “<”: if the predictor values for the control group are lower or equal than the values of the case group (controls < t <= cases).

In my case: Does direction mean: when I calculate the 1st sample[the prediction probability for Control=0.24642643; Case=0.7535736], if I use threshold=0.5 and direction ">", direction means: 0.7535736> 0.5, sample 1 will be predicted as "Case"? If I use threshold = 0.5 and direction "<", what does direction mean? Thank you for your patience!

xrobin commented 6 months ago

I attempted to clarify the documentation. Here is the new description of direction:

how are positive observations defined? “<”: observations are positive when they are greater than or equal (>=) to the threshold. “>”: observations are positive when they are smaller than or equal (<=) to the threshold. “auto” (default): automatically detect in which group the median is higher and take the direction accordingly. See details. You should set this explicity to “>” or “<” whenever you are resampling or randomizing the data, otherwise the curves will be biased towards higher AUC values.

Is it clearer like this?