Closed rquintino closed 3 years ago
Thanks for asking. To my knowledge there is no universal way to generate feature importance for outlier detection. If you have the ground truth and could treat the problem as a supervised problem with imbalanced data, then you could access the feature importance with tree ensembles, e.g., xgboost.
However, if we are discussing unsupervised algorithms currently implemented in PyOD, no feature selection is available. The key reason is the result could be questionable with imbalanced data under unsupervised setting. Some outliers are well hidden in the dimensions which look unimportant.
However, this functionality might be implemented in multiple models. For instance, Isolation Forest may use the average position of the attributes in the isolation trees as the feature importance ranking. I will add this feature once the key models are implemented:)
Meanwhile, some recent researches on this topic might be helpful, like https://www.ijcai.org/Proceedings/16/Papers/272.pdf
Hi @yzhao062 thanks for the help & paper. will read!
I did some tests with pyod/isolation forest, and shapley values/shap package, results seem promising for some synthetic outliers, but needs a lot of testing. Ideas on how to proper test this? Other thoughts?
Do we have feature selection on pyod? I am trying with unsupervised learning and wanted to check feature importance
Do we have any measure for explainability of a given prediction giving importance to a single/set of features that drives the decision using a PyOD estimator and a test sample?
@ChiragSoni95 both isolation forest and copod provide feature importance.
See an example of isolation forest https://github.com/yzhao062/pyod/blob/development/examples/iforest_example.py line 57.
copod example is not ready. Should be added then.
@ChiragSoni95 both isolation forest and copod provide feature importance.
See an example of isolation forest https://github.com/yzhao062/pyod/blob/development/examples/iforest_example.py line 57.
copod example is not ready. Should be added then.
Thanks @yzhao062 Any idea if OCSVM will have the similar utility? Also, is the new property of Isolation Forest included in version==0.9.8 as I can't see that property from the trained estimator?
@ChiragSoni95 ocsvm does not support this. https://github.com/yzhao062/pyod/blob/development/pyod/models/iforest.py shows it is already supported in v0.9.8. Maybe you need to double check your local version
@ChiragSoni95 ocsvm does not support this. https://github.com/yzhao062/pyod/blob/development/pyod/models/iforest.py shows it is already supported in v0.9.8. Maybe you need to double check your local version
Thanks
Hello, is there anything available to identify/highlight what may be the features most probable to be triggering the outlier status? thx!