yzhao062 / pyod

A Python Library for Outlier and Anomaly Detection, Integrating Classical and Deep Learning Techniques
http://pyod.readthedocs.io
BSD 2-Clause "Simplified" License
8.48k stars 1.36k forks source link

Quasi-Monte Carlo Discrepancy always predicts an outlier #549

Open Hellsice opened 6 months ago

Hellsice commented 6 months ago

I've found that the QMCD model will always predict at least one outlier due to the normalization of its decision scores. This results in the model not performing at all if there are no outliers in the dataset. Is this intentional? If so, why was it implemented like this?

KulikDM commented 6 months ago

Hi @Hellsice great question and I see your concern. The normalization of the decision scores was done since QMCD sometimes tends to identify the outlier class as having the lower scores. Normalizing allows for a simple test to flip the results if this happens. However, I myself have noticed that this simple test is not very robust and perhaps a better sense check would be to rather check the skewness of the scores' distribution and flip if it is highly skewed to the left. Will investigate this.