kLabUM / rrcf

🌲 Implementation of the Robust Random Cut Forest algorithm for anomaly detection on streams
https://klabum.github.io/rrcf/
MIT License
495 stars 112 forks source link

Intuition behind shingle_size #77

Open lalitsomnathe opened 4 years ago

lalitsomnathe commented 4 years ago

I have been spending some time to understand rrcf. When we consider streaming data, what is the intuition behind shingle_size? At first I understand that it is kind of rolling window concept (like timestamps in LSTM). Also, I thought it would similar to frequency of the wave( for say), i.e. shingle_size=no. of data points in on period. So if I have a weekly trend, then shingle_size = 7(a week) . But this doesn't seem to be correct. Could you please put some light about how should we choose shingle_size? @mdbartos :)