kLabUM / rrcf

🌲 Implementation of the Robust Random Cut Forest algorithm for anomaly detection on streams
https://klabum.github.io/rrcf/
MIT License
488 stars 111 forks source link

How do select the number of trees? #102

Open pratikgandhic1 opened 10 months ago

pratikgandhic1 commented 10 months ago

Hi there,

I had a general question: What is the usual criteria of selecting the number of trees and tree_size in a RCF model?

Thanks!

mwhitworth commented 4 months ago

This would be an open research question, since the original paper doesn't cover it, and depends on the properties of the dataset you are trying to filter.

The best way to go about this is through experimentation - you should have an idea of what "good" means (either through labelling, or by using another outlier algorithm that acts with "perfect knowledge" of the previous batch data), and compare various values against that benchmark.