Issue when predicting identical points using a batch trained model

kLabUM / rrcf

🌲 Implementation of the Robust Random Cut Forest algorithm for anomaly detection on streams

https://klabum.github.io/rrcf/

MIT License

495 stars 112 forks source link

Issue when predicting identical points using a batch trained model #94

Closed alexstrid closed 1 year ago

alexstrid commented 2 years ago

Hi, I have a training dataset with many identical datapoints. I use batch-mode to train the model. Thereafter, when I insert a point that is identical to a subset of points in the training dataset, the point will displace all its existing copies. This results in a high (co)displacement-score for this point, even though the point is very common.

Update: setting the tolerance to 0 when inserting a point did the trick.