yzhao062 / pyod

A Python Library for Outlier and Anomaly Detection, Integrating Classical and Deep Learning Techniques
http://pyod.readthedocs.io
BSD 2-Clause "Simplified" License
8.57k stars 1.37k forks source link

Which algorithms of PyOD are "robust"? #500

Open asmaier opened 1 year ago

asmaier commented 1 year ago

In many cases, training with anomalies (outliers) in the (unlabeled) training data might lead to learning wrong detection models. For these cases so called robust algorithms have been developed. But I couldn't find documentation about which algorithms of PyOD are robust. For example the PyOD implementation of PCA seems to be using the sklearns PCA, which is not a robust PCA as described at https://en.wikipedia.org/wiki/Robust_principal_component_analysis or https://en.wikipedia.org/wiki/L1-norm_principal_component_analysis. So which algorithms in PyOD are really robust ?

yzhao062 commented 1 year ago

robustness is a relative term. I would recommend isolation forest as an ensemble methods. good performance and relatively good robustness.

asmaier commented 1 year ago

I think there is a misunderstanding. Robustness in statistics is not a relative term. There is a whole field called robust statistics.

Robust statistics seek to provide methods that emulate popular statistical methods, but are not unduly affected by outliers or other small departures from model assumptions. (https://en.wikipedia.org/wiki/Robust_statistics)

But I agree the term robust can have a different meanings for people not familiar with that field, so let me reformulate my question:

Which algorithms of PyOD are not unduly affected by outliers?