kLabUM / rrcf

🌲 Implementation of the Robust Random Cut Forest algorithm for anomaly detection on streams
https://klabum.github.io/rrcf/
MIT License
488 stars 111 forks source link

Dealing with data-stream of constant values during a certain period #80

Open shfa5275 opened 3 years ago

shfa5275 commented 3 years ago

In certain cases, a stream may continue to get constant values for a while. Sometimes, in this case xmin=xmax resulting in l=nan, thereby leading to an exception in the following code:

def _cut(self, X, S, parent=None, side='l'):

Find max and min over all d dimensions

    xmax = X[S].max(axis=0)
    xmin = X[S].min(axis=0)

    # Compute l
    l = xmax - xmin
    l /= l.sum()

Any suggestions to deal with this "special case" gracefully!

mdbartos commented 3 years ago

I do not think the algorithm is well-defined for the case where all points are exactly identical, because you cannot partition the point set.

https://klabum.github.io/rrcf/tree-construction.html

In this case, you would essentially skip the tree construction algorithm and create a root node that is also a leaf that contains all the points in the set.