Closed alfian777 closed 5 years ago
Hi @alfian777 , thanks for the question. The problem with supervised trees is that you try to find the best splitting point across very dimension using the information gain (and other metrics), if you add hyperplanes it becomes very hard to find the best one given the countless alternatives, there are some models that extend these ideas, like the extremely randomize trees in sklearn that provide a similar approach.
Hi there,
Hm, as I understand it even extremely randomize trees are still suffer from box like decision boundary. since it still split on one feature dimension only.
I agree with you that bigger dimension will cause finding optimal spliting "hyperplane" difficult. Maybe some trial with 2 dimension extension only is a good test to the concept, whether its worth to pursue or not.
Anyway thanks for the response! Great paper!
Yes it does, I mean it in the sense of adding extra layer of randomness but you are right it still has that boxiness issue. Thanks for the comments!
I think you could definitly modify extremely randomized trees to choose a random hyperplane.
Modifying XGBoost would be more interesting. You'd maybe end up with something like LDA or linear regression to find the hyperplace at each step, which could be pretty computationally expensive.
Here's one way to remove the "axis bias" in a supervised context: https://arxiv.org/abs/1506.03410
Hi there,
This might be dummy questions.
I was curious whether the "extension" concept that you introduce can be applied to Supervised version such as Gradient Boosted Trees algorithm or not. There was several widely known Implementation like XGBoost or LightGBM. All of these GBT also suffer from "box" like decision boundary. I believe it would be great to see GBT to create decision boundary the way your extended isolation forest was producing.
What do you guys think?
Feel free to close this issue since its not real issue, just discussion.