jeffreyevans / rfUtilities

R package for random forests model selection, inference, evaluation and validation
GNU General Public License v3.0
23 stars 11 forks source link

AUC calculation #2

Closed mzeybek583 closed 5 years ago

mzeybek583 commented 6 years ago

Hi,

I have doubt about accuracy function AUC calculation.

Here is the function.

auc <- (tpr - ( t.xy[3] / ( t.xy[2] + t.xy[3] ) ) + 1) / 2

But I think it should be (tpr-fpr+1)/2

auc <- (tpr - ( t.xy[3] / ( t.xy[3] + t.xy[4] ) ) + 1 ) / 2

what do you think?

thanks

@jeffreyevans

jeffreyevans commented 6 years ago

Have you actually tested this or compared it to an expected value or you just going off of where you “expect” the position in the matrix to be? The function is using the base R table function, which returns a traditional type I-II error matrix and not a contingency table per se. Because of this, the position of tp/fp are in different position(s) in the matrix thus, the difference in notation as compared to a citations such as Powers (2011). Before documenting this as a bug, please check results and if there is a discrepancy in the statistic then I will consider it a bug and readily address it. In doing so, please provide a reproducible example so that I can verify the bug and get at the nature of the problem.

Best, Jeffrey S. Evans, PhD

From: muzo583 notifications@github.com Sent: Saturday, September 15, 2018 7:32 AM To: jeffreyevans/rfUtilities rfUtilities@noreply.github.com Cc: Jeffrey Evans jeffrey_evans@TNC.ORG; Mention mention@noreply.github.com Subject: [jeffreyevans/rfUtilities] AUC calculation (#2)

Hi,

I have doubt about accuracy function AUC calculation.

Here is the function.

auc <- (tpr - ( t.xy[3] / ( t.xy[2] + t.xy[3] ) ) + 1) / 2

But I think it should be (tpr-fpr+1)/2

auc <- (tpr - ( t.xy[3] / ( t.xy[3] + t.xy[4] ) ) + 1 ) / 2

what do you think?

thanks

@jeffreyevanshttps://github.com/jeffreyevans

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/jeffreyevans/rfUtilities/issues/2, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJaj_2jKVLel-AWWsiIoIujjlAoLGEnuks5ubQFagaJpZM4WqXEU.

mzeybek583 commented 5 years ago

I realized that confusion matrix you used is different from referenced here (link) but it still not clear to me.

Here is my code for check. x is ground truth, y is classification result of the tested method.

library(rfUtilities)

x <- c(1,1,1,2,2,2,2,1,2,1,2,1,2) #ground truth (real)
y <- c(2,1,1,1,2,2,1,1,2,1,1,2,1) #observed from classification method result

ac2 <- accuracy(x,y)
ac2$confusion
#   y
#x   1 2
#  1 4 2
#  2 4 3

## AUC
tpr <- ac2$confusion[1]/(ac2$confusion[1]+ac2$confusion[1])
tpr
fpr <- ac2$confusion[3]/(ac2$confusion[3]+ac2$confusion[4])
fpr
AUC <- (tpr-fpr+1)/2
AUC

diff <- ac2$auc-AUC
diff

Cheers

mzeybek583 commented 5 years ago

Link as you pointed from your reference. In equation (8) fall out B/(B+D), and I saw you did not use t.xy[4] in your function. And as I see you say x is predicted, y observed. What do you mean with it? which one is real? predicted? if so, from the paper, xy table being converted with R table function as I understand. your xy table left side being True (real values) and top observed values. Am I right? Could you please check it? I am really confused..

jeffreyevans commented 5 years ago

I did get off on the indexing for deriving AUC/ROC. In systematically checking all of the other validation statistics, they look correct. I have fixed the auc statistic but, rejected your specific pull request. I did this because I changed the code so it is now creating specific objects for the input values (TP,FN,FP,TN) and changed the output matrix to a common format for a confusion matrix as to avoid future confusion. I also am now producing the Gini Index based on [2 * auc - 1] which normalizes the AUC so that a random classifier scores 0, and a perfect classifier scores 1.

You can install the development version of rfUtilities from GitHub to implement these changes in the code using: devtools::install_github("jeffreyevans/rfUtilities")

Best, Jeff

Jeffrey S. Evans, Ph.D., | Senior Landscape Ecologist

The Nature Conservancy | Global Lands Science Team

Visiting Professor | University of Wyoming | Zoology & Physology

Laramie, WY | jeffrey_evans@tnc.orgmailto:jeffrey_evans@tnc.org | (970) 672-6766<tel:(970)%20672-6766>

From: muzo583 notifications@github.com Sent: Tuesday, September 18, 2018 12:46 AM To: jeffreyevans/rfUtilities rfUtilities@noreply.github.com Cc: Jeffrey Evans jeffrey_evans@TNC.ORG; Mention mention@noreply.github.com Subject: Re: [jeffreyevans/rfUtilities] AUC calculation (#2)

Linkhttps://dspace2.flinders.edu.au/xmlui/bitstream/handle/2328/27165/Powers%20Evaluation.pdf?sequence=1&isAllowed=y as you pointed from your reference. In equation (8) fall out B/(B+D), and I saw you did not use t.xy[4] in your function. Am I right?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/jeffreyevans/rfUtilities/issues/2#issuecomment-422274814, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJaj_1cFDevnpli36pseq6C-maK8-9loks5ucJbAgaJpZM4WqXEU.

mzeybek583 commented 5 years ago

Thank you.

cheers.