kLabUM / rrcf

🌲 Implementation of the Robust Random Cut Forest algorithm for anomaly detection on streams
https://klabum.github.io/rrcf/
MIT License
495 stars 112 forks source link

Question related to the paper #72

Closed adavoli91 closed 4 years ago

adavoli91 commented 4 years ago

Hi,

my question is more "theoretical" rather than directly related to the implementation of the algorithm, but I hope someone will be able to help me in understanding this point.

In the paper, it is stated that the insertion/removal of the point is possible because of how the splitting are performed; in particular, the authors say:

"For example, if we choose the dimensions uniformly at random as in (Liu et al., 2012), suppose we build a tree for (1,0),(ε,ε),(0,1) where 1 ≫ ε > 0 and then delete (1,0). The probability of getting a tree over the two remaining points that uses a vertical separator is 3/4 − ε/2 and not 1/2 as desired".

Could anyone help me to understand this statement? It's not clear to me how these probabilities are obtained.

Thanks