yzhao062 / pyod

A Python Library for Outlier and Anomaly Detection, Integrating Classical and Deep Learning Techniques
http://pyod.readthedocs.io
BSD 2-Clause "Simplified" License
8.42k stars 1.36k forks source link

why can't contamination parameter be equal to 0 #127

Open eladtann opened 5 years ago

eladtann commented 5 years ago

Hi, let's assume i have training data which i want to define as normal, namely which does not include outliers at all. I would then expect to be able to fit a model to this training data such that the threshold for considering any sample as an outlier would be greater than any of the training samples. But currently the fact that contamination must be greater than 0, prevents this behavior.

Could you please explain the motivation? thanks a lot, Elad

yzhao062 commented 5 years ago

What you mentioned is called novelty detection, slightly different from outlier detection. General outlier detection algorithms assume the existence of outliers, otherwise no prediction may be made. You could read this here (https://pyod.readthedocs.io/en/latest/relevant_knowledge.html)

eladtann commented 5 years ago

Notice the last paragraph which states " The algorithms found in PyOD focus on the first two approaches..." the first two are:

  1. unsupervised OL detection
  2. semi-supervised novelty detection

and if i understood correctly contamination!=0 means that 2nd approach isn't supported.

thanks

On Wed, Jul 17, 2019 at 2:41 AM Yue Zhao notifications@github.com wrote:

What you mentioned is called novelty detection, slightly different from outlier detection. General outlier detection algorithms assume the existence of outliers, otherwise no prediction may be made. You could read this here ( https://pyod.readthedocs.io/en/latest/relevant_knowledge.html)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/yzhao062/pyod/issues/127?email_source=notifications&email_token=ALBCHRRPMXKBTJLGKFBTPJLP7ZMCLA5CNFSM4IEBLXEKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2CSKRQ#issuecomment-512042310, or mute the thread https://github.com/notifications/unsubscribe-auth/ALBCHRWIDBVYHSVRIBLXRWTP7ZMCLANCNFSM4IEBLXEA .