yzhao062 / pyod

A Python Library for Outlier and Anomaly Detection, Integrating Classical and Deep Learning Techniques
http://pyod.readthedocs.io
BSD 2-Clause "Simplified" License
8.52k stars 1.36k forks source link

Question regarding precision@n and roc(@n?) #120

Open Henlam opened 5 years ago

Henlam commented 5 years ago

Hello,

first and foremost, thank you for building this wrapper it is of great use for me and many others.

I have question regarding the evaluation: Most outlier detection evaluation settings work by setting the ranking number n equal the number of outliers (aka contamination) and so did I in my experiments.

My thought concerning the ROC and AUC score was:

  1. Don't we have to to rank the outlier scores from highest to lowest and evaluate ROC only on the n numbers. Thus, needing a ROC@n curve?
  2. Why do people use ROC and AUC for outlier detection problems which by nature are heavily skewed and unbalanced. Hitting a lot of true negatives is easy and guaranteed, if the algorithms knows that there only n numbers of outliers.

In my case the precision@n of my chosen algorithms are valued in the range of 0.2-0.4 because it is a difficult dataset. However, the AUC score is quite high at the same.

I would appreciate any thoughts on this since I am fairly new to the topic and might not grasp the intuition of the ROC curve for this task.

Best regards

Hlam

yzhao062 commented 5 years ago

You are correct. roc @ n samples is also a popular choice. Will put this on my todo list. I do not know any particular reason why people usually report the full ROC but reporting roc @ n is not uncommon :)

One thought is roc @ n is a point evaluation but roc considers the full picture.

Henlam commented 5 years ago

Thank you very much for your answer.

If you say normal roc considers the full picture, does a high roc in this case just mean that on average the ground-truth true outliers are ranked ahead of most inlier points, but not necessarily in the top n? i.e. ROC=0 if the true outliers are at the bottom of the outlier score ranking?

evanmiller29 commented 5 years ago

@yzhao062 - Need some help updating the documentation? I'm happy to open a pull request for this

yzhao062 commented 5 years ago

@evanmiller29 sorry for the delay. pr is always welcome :)

yzhao062 commented 5 years ago

@Henlam I think the most relevant paper for this topic is:

Hope this helps

evanmiller29 commented 5 years ago

Thanks for passing through the papers. I'm having a reading at the moment. I'm not a 100% outlier detection (more normal ML) person but I'm keen to be involved in the project. You OK with that?

firmai commented 4 years ago

This metric always give the same ROC regardless of level of KNN contamination.


from pyod.utils.data import evaluate_print
# evaluate and print the results
print("\nOn Training Data:")
evaluate_print(clf_name, isfraud, y_train_scores)

This metric constantly changes depending on the level of KNN contamination. Is this normal.


from sklearn import metrics

# evaluate and print the results
print("\nOn Training Data:")
print("Roc Auc score",round(metrics.roc_auc_score(isfraud, y_train_pred),2))
yzhao062 commented 4 years ago

This metric always give the same ROC regardless of level of KNN contamination.


from pyod.utils.data import evaluate_print
# evaluate and print the results
print("\nOn Training Data:")
evaluate_print(clf_name, isfraud, y_train_scores)

This metric constantly changes depending on the level of KNN contamination. Is this normal.


from sklearn import metrics

# evaluate and print the results
print("\nOn Training Data:")
print("Roc Auc score",round(metrics.roc_auc_score(isfraud, y_train_pred),2))

See this one: https://github.com/yzhao062/pyod/issues/144 ROC is evaluating ranking...not labels.

if y_train_pred is the predicated scores, then it is normal. If y_train_pred is the predicted labels, then it is wried.

firmai commented 4 years ago

Indeed it is quite weird, it is the predicted labels.

y_trainpred = clf.labels # binary labels (0: inliers, 1: outliers)

My reproducible code:

import pandas as pd
df = pd.read_csv('https://github.com/firmai/random-assets/blob/master/fraud.csv?raw=true').iloc[:,1:]

df = df.drop(columns=["nameOrig","nameDest"])
# one hot encoding
df = pd.get_dummies(df,prefix=['type'])
isfraud = df.pop("isFraud")
isflaggedfraud = df.pop("isFlaggedFraud")

from pyod.models.knn import KNN   # kNN detector

# train kNN detector
clf_name = 'KNN'
clf = KNN(contamination=0.0756)
clf.fit(df)

# get the prediction label and outlier scores of the training data
y_train_pred = clf.labels_  # binary labels (0: inliers, 1: outliers)
y_train_scores = clf.decision_scores_  # raw outlier scores

from pyod.utils.data import evaluate_print
from sklearn import metrics

# evaluate and print the results
print("\nOn Training Data:")
evaluate_print(clf_name, isfraud, y_train_scores)

# evaluate and print the results
print("\nOn Training Data:")
evaluate_print(clf_name, isfraud, y_train_pred)

# evaluate and print the results
print("\nOn Training Data:")
print("Roc Auc score",round(metrics.roc_auc_score(isfraud, y_train_pred),2))
yzhao062 commented 4 years ago

I run the code. the reason is that you only get 122 outliers among 100000. So you need to change contamination to be small enough to see a difference (<122). Otherwise, it is misclassed anyway.

However, you should not use ROC to evaluate label but score.

raoarisa commented 4 years ago

why precision is used to decide the model... outlier should not be detected as inlier as it would be costliest error... so, false negative rate should be less.. so type 2 error will be taken care to decide the model.

yzhao062 commented 4 years ago

why precision is used to decide the model... outlier should not be detected as inlier as it would be costliest error... so, false negative rate should be less.. so type 2 error will be taken care to decide the model.

I think it is indeed precision @ rank n or precision @ rank k, which is still slightly different than precision.