Open Henlam opened 5 years ago
You are correct. roc @ n samples is also a popular choice. Will put this on my todo list. I do not know any particular reason why people usually report the full ROC but reporting roc @ n is not uncommon :)
One thought is roc @ n is a point evaluation but roc considers the full picture.
Thank you very much for your answer.
If you say normal roc considers the full picture, does a high roc in this case just mean that on average the ground-truth true outliers are ranked ahead of most inlier points, but not necessarily in the top n? i.e. ROC=0 if the true outliers are at the bottom of the outlier score ranking?
@yzhao062 - Need some help updating the documentation? I'm happy to open a pull request for this
@evanmiller29 sorry for the delay. pr is always welcome :)
@Henlam I think the most relevant paper for this topic is:
Hope this helps
Thanks for passing through the papers. I'm having a reading at the moment. I'm not a 100% outlier detection (more normal ML) person but I'm keen to be involved in the project. You OK with that?
This metric always give the same ROC regardless of level of KNN contamination.
from pyod.utils.data import evaluate_print
# evaluate and print the results
print("\nOn Training Data:")
evaluate_print(clf_name, isfraud, y_train_scores)
This metric constantly changes depending on the level of KNN contamination. Is this normal.
from sklearn import metrics
# evaluate and print the results
print("\nOn Training Data:")
print("Roc Auc score",round(metrics.roc_auc_score(isfraud, y_train_pred),2))
This metric always give the same ROC regardless of level of KNN contamination.
from pyod.utils.data import evaluate_print # evaluate and print the results print("\nOn Training Data:") evaluate_print(clf_name, isfraud, y_train_scores)
This metric constantly changes depending on the level of KNN contamination. Is this normal.
from sklearn import metrics # evaluate and print the results print("\nOn Training Data:") print("Roc Auc score",round(metrics.roc_auc_score(isfraud, y_train_pred),2))
See this one: https://github.com/yzhao062/pyod/issues/144 ROC is evaluating ranking...not labels.
if y_train_pred is the predicated scores, then it is normal. If y_train_pred is the predicted labels, then it is wried.
Indeed it is quite weird, it is the predicted labels.
y_trainpred = clf.labels # binary labels (0: inliers, 1: outliers)
My reproducible code:
import pandas as pd
df = pd.read_csv('https://github.com/firmai/random-assets/blob/master/fraud.csv?raw=true').iloc[:,1:]
df = df.drop(columns=["nameOrig","nameDest"])
# one hot encoding
df = pd.get_dummies(df,prefix=['type'])
isfraud = df.pop("isFraud")
isflaggedfraud = df.pop("isFlaggedFraud")
from pyod.models.knn import KNN # kNN detector
# train kNN detector
clf_name = 'KNN'
clf = KNN(contamination=0.0756)
clf.fit(df)
# get the prediction label and outlier scores of the training data
y_train_pred = clf.labels_ # binary labels (0: inliers, 1: outliers)
y_train_scores = clf.decision_scores_ # raw outlier scores
from pyod.utils.data import evaluate_print
from sklearn import metrics
# evaluate and print the results
print("\nOn Training Data:")
evaluate_print(clf_name, isfraud, y_train_scores)
# evaluate and print the results
print("\nOn Training Data:")
evaluate_print(clf_name, isfraud, y_train_pred)
# evaluate and print the results
print("\nOn Training Data:")
print("Roc Auc score",round(metrics.roc_auc_score(isfraud, y_train_pred),2))
I run the code. the reason is that you only get 122 outliers among 100000. So you need to change contamination to be small enough to see a difference (<122). Otherwise, it is misclassed anyway.
However, you should not use ROC to evaluate label but score.
why precision is used to decide the model... outlier should not be detected as inlier as it would be costliest error... so, false negative rate should be less.. so type 2 error will be taken care to decide the model.
why precision is used to decide the model... outlier should not be detected as inlier as it would be costliest error... so, false negative rate should be less.. so type 2 error will be taken care to decide the model.
I think it is indeed precision @ rank n or precision @ rank k, which is still slightly different than precision.
Hello,
first and foremost, thank you for building this wrapper it is of great use for me and many others.
I have question regarding the evaluation: Most outlier detection evaluation settings work by setting the ranking number n equal the number of outliers (aka contamination) and so did I in my experiments.
My thought concerning the ROC and AUC score was:
In my case the precision@n of my chosen algorithms are valued in the range of 0.2-0.4 because it is a difficult dataset. However, the AUC score is quite high at the same.
I would appreciate any thoughts on this since I am fairly new to the topic and might not grasp the intuition of the ROC curve for this task.
Best regards
Hlam