sahandha / eif

Extended Isolation Forest for Anomaly Detection
Other
445 stars 117 forks source link

Can the extension concept Applied to Gradient Boosted Machine? #5

Closed alfian777 closed 5 years ago

alfian777 commented 5 years ago

Hi there,

This might be dummy questions.

I was curious whether the "extension" concept that you introduce can be applied to Supervised version such as Gradient Boosted Trees algorithm or not. There was several widely known Implementation like XGBoost or LightGBM. All of these GBT also suffer from "box" like decision boundary. I believe it would be great to see GBT to create decision boundary the way your extended isolation forest was producing.

What do you guys think?

Feel free to close this issue since its not real issue, just discussion.

mgckind commented 5 years ago

Hi @alfian777 , thanks for the question. The problem with supervised trees is that you try to find the best splitting point across very dimension using the information gain (and other metrics), if you add hyperplanes it becomes very hard to find the best one given the countless alternatives, there are some models that extend these ideas, like the extremely randomize trees in sklearn that provide a similar approach.

alfian777 commented 5 years ago

Hi there,

Hm, as I understand it even extremely randomize trees are still suffer from box like decision boundary. since it still split on one feature dimension only.

I agree with you that bigger dimension will cause finding optimal spliting "hyperplane" difficult. Maybe some trial with 2 dimension extension only is a good test to the concept, whether its worth to pursue or not.

Anyway thanks for the response! Great paper!

mgckind commented 5 years ago

Yes it does, I mean it in the sense of adding extra layer of randomness but you are right it still has that boxiness issue. Thanks for the comments!

zachmayer commented 3 years ago

I think you could definitly modify extremely randomized trees to choose a random hyperplane.

Modifying XGBoost would be more interesting. You'd maybe end up with something like LDA or linear regression to find the hyperplace at each step, which could be pretty computationally expensive.

zachmayer commented 3 years ago

Here's one way to remove the "axis bias" in a supervised context: https://arxiv.org/abs/1506.03410