Open asmaier opened 1 year ago
robustness is a relative term. I would recommend isolation forest as an ensemble methods. good performance and relatively good robustness.
I think there is a misunderstanding. Robustness in statistics is not a relative term. There is a whole field called robust statistics.
Robust statistics seek to provide methods that emulate popular statistical methods, but are not unduly affected by outliers or other small departures from model assumptions. (https://en.wikipedia.org/wiki/Robust_statistics)
But I agree the term robust can have a different meanings for people not familiar with that field, so let me reformulate my question:
Which algorithms of PyOD are not unduly affected by outliers?
In many cases, training with anomalies (outliers) in the (unlabeled) training data might lead to learning wrong detection models. For these cases so called robust algorithms have been developed. But I couldn't find documentation about which algorithms of PyOD are robust. For example the PyOD implementation of PCA seems to be using the sklearns PCA, which is not a robust PCA as described at https://en.wikipedia.org/wiki/Robust_principal_component_analysis or https://en.wikipedia.org/wiki/L1-norm_principal_component_analysis. So which algorithms in PyOD are really robust ?