tidymodels / yardstick

Tidy methods for measuring model performance
https://yardstick.tidymodels.org/
Other
370 stars 54 forks source link

Additional issue with Survival ROC #502

Closed asb2111 closed 7 months ago

asb2111 commented 7 months ago

Hello, thank you for taking the time to make the update based on my last issue regarding the survival ROC. I think I found one additional bug in the code here.

Currently, it is arranging the predicted probabilities from smallest to largest and then using a cumulative sum to get the components of the ROC curve. I believe it needs to actually be in the opposite order, going from largest to smallest. The reason is that the 'strictest' threshold, that which will have the lowest sensitivity, will only include those with the highest predicted probability, and then as the threshold gets less strict those with lower predicted probabilities will be counted and increase the sensitivity. So we need to start the cumulative sum with those having the highest predicted probability and add in those with lower predicted probabilities to reflect the increase in sensitivity as the threshold becomes less strict. The fix should be changing the linked to line of code to just be:

  data_split <- data_split$val[order(data_split$key, decreasing = TRUE)]

I made an edit to my related feature request #496 for the survival version of the PR curve to fix the same bug.

asb2111 commented 7 months ago

I believe this would also make it consistent with the non-survival version of the ROC.

asb2111 commented 7 months ago

Sorry, I realize my mistake due to the models predicting survival and not events

github-actions[bot] commented 6 months ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.