yzhao062 / pyod

A Python Library for Outlier and Anomaly Detection, Integrating Classical and Deep Learning Techniques
http://pyod.readthedocs.io
BSD 2-Clause "Simplified" License
8.59k stars 1.37k forks source link

local feature importance for outlier prediction? #5

Closed rquintino closed 3 years ago

rquintino commented 6 years ago

Hello, is there anything available to identify/highlight what may be the features most probable to be triggering the outlier status? thx!

yzhao062 commented 6 years ago

Thanks for asking. To my knowledge there is no universal way to generate feature importance for outlier detection. If you have the ground truth and could treat the problem as a supervised problem with imbalanced data, then you could access the feature importance with tree ensembles, e.g., xgboost.

However, if we are discussing unsupervised algorithms currently implemented in PyOD, no feature selection is available. The key reason is the result could be questionable with imbalanced data under unsupervised setting. Some outliers are well hidden in the dimensions which look unimportant.

However, this functionality might be implemented in multiple models. For instance, Isolation Forest may use the average position of the attributes in the isolation trees as the feature importance ranking. I will add this feature once the key models are implemented:)

Meanwhile, some recent researches on this topic might be helpful, like https://www.ijcai.org/Proceedings/16/Papers/272.pdf

rquintino commented 6 years ago

Hi @yzhao062 thanks for the help & paper. will read!

I did some tests with pyod/isolation forest, and shapley values/shap package, results seem promising for some synthetic outliers, but needs a lot of testing. Ideas on how to proper test this? Other thoughts?

image image image

nehadave78 commented 4 years ago

Do we have feature selection on pyod? I am trying with unsupervised learning and wanted to check feature importance

ChiragSoni95 commented 2 years ago

Do we have any measure for explainability of a given prediction giving importance to a single/set of features that drives the decision using a PyOD estimator and a test sample?

yzhao062 commented 2 years ago

@ChiragSoni95 both isolation forest and copod provide feature importance.

See an example of isolation forest https://github.com/yzhao062/pyod/blob/development/examples/iforest_example.py line 57.

copod example is not ready. Should be added then.

ChiragSoni95 commented 2 years ago

@ChiragSoni95 both isolation forest and copod provide feature importance.

See an example of isolation forest https://github.com/yzhao062/pyod/blob/development/examples/iforest_example.py line 57.

copod example is not ready. Should be added then.

Thanks @yzhao062 Any idea if OCSVM will have the similar utility? Also, is the new property of Isolation Forest included in version==0.9.8 as I can't see that property from the trained estimator?

yzhao062 commented 2 years ago

@ChiragSoni95 ocsvm does not support this. https://github.com/yzhao062/pyod/blob/development/pyod/models/iforest.py shows it is already supported in v0.9.8. Maybe you need to double check your local version

ChiragSoni95 commented 2 years ago

@ChiragSoni95 ocsvm does not support this. https://github.com/yzhao062/pyod/blob/development/pyod/models/iforest.py shows it is already supported in v0.9.8. Maybe you need to double check your local version

Thanks