Open stianvale opened 3 years ago
To clarify, do you mean: for a set of multidimensional points, which dimension contributes the most to the total codisp over all points in the dataset?
These three pages of the docs may be useful: Tree construction: https://klabum.github.io/rrcf/tree-construction.html Anomaly scoring: https://klabum.github.io/rrcf/anomaly-scoring.html Caveats: https://klabum.github.io/rrcf/caveats.html
Perhaps it would be helpful to specify a (mathematical) definition of feature importance for your problem of interest. Or perhaps you can describe the particular problem you are trying to solve.
Thanks for your reply @mdbartos !
Yeah, what I'm asking for is: For a given multidimensional point, which dimensions contribute the most to that point's codisp.
I have a draft approach on this, that just compares the point's dimension values with the mean dimension values of all points. In this way, we can see what dimensions are differing the most from 'normal' behavior. But that is just a temporary proxy for feature importance.
So what I'm asking is if there is some way to deduct the feature importances of a point from the formula of codisp.
Does that make sense?
Hi again! @mdbartos, have you ever experimented with computing the feature importance of a particular point? I think this would be a great addition to the current library in terms of improving the explainability of the anomalies.
Hi, and thanks building this great repo!
I have a general question; what's the proper way to compute feature importance for RRCF? Basically, I want to know what features contribute the most to the collusive displacement value.