yzhao062 / pyod

A Python Library for Outlier and Anomaly Detection, Integrating Classical and Deep Learning Techniques
http://pyod.readthedocs.io
BSD 2-Clause "Simplified" License
8.53k stars 1.36k forks source link

IForest: FutureWarning: behaviour="old" is deprecated #79

Closed bflammers closed 5 years ago

bflammers commented 5 years ago

Hi,

Thanks for a great library!

When declaring a new IForest object, Sklearn throws the following warning:

FutureWarning: behaviour="old" is deprecated and will be removed in version 0.22. Please use behaviour="new", which makes the decision_function change to match other anomaly detection algorithm API. FutureWarning)

This new behavior in sklearn's iforest is about where the threshold is set between anomalies and normal observations. See documentation on behaviour argument and offset_:

behaviour : str, default='old' Behaviour of the decision_function which can be either 'old' or 'new'. Passing behaviour='new' makes the decision_function change to match other anomaly detection algorithm API which will be the default behaviour in the future. As explained in details in the offset_ attribute documentation, the decision_function becomes dependent on the contamination parameter, in such a way that 0 becomes its natural threshold to detect outliers.

offset_ : float Offset used to define the decision function from the raw scores. We have the relation: decision_function = score_samples - offset_. Assuming behaviour == 'new', offset_ is defined as follows. When the contamination parameter is set to "auto", the offset is equal to -0.5 as the scores of inliers are close to 0 and the scores of outliers are close to -1. When a contamination parameter different than "auto" is provided, the offset is defined in such a way we obtain the expected number of outliers (samples with decision function < 0) in training. Assuming the behaviour parameter is set to 'old', we always have offset_ = -0.5, making the decision function independent from the contamination parameter.

I think a simple fix would be to add argument behaviour="new" in the call to sklearn.ensemble.IsolationForest

yzhao062 commented 5 years ago

Thanks for the PR. I did some code cleanup and documentation. This should be out in the next release (0.7.0).