linkedin / isolation-forest

A distributed Spark/Scala implementation of the isolation forest algorithm for unsupervised outlier detection, featuring support for scalable training and ONNX export for seamless cross-platform inference.
Other
229 stars 47 forks source link

Feature request: warm_start #46

Closed ruizcrp closed 7 months ago

ruizcrp commented 7 months ago

Hi, I was just looking for possibilities to further train a previous model with additional/new data. This is quite relevant in the big data field as it would otherwise require to keep a large amount of data to retrain the model every time from scratch.

In the following article such a possibility is given with isolation forest using sklearn: https://medium.com/grabngoinfo/isolation-forest-for-anomaly-detection-cd7871ae99b4

Are you planning to implement something similar for your spark-solution of isolation forest?

Kind regards and thank you for this great library!

ruizcrp commented 7 months ago

After reading the original isolation forest paper by Liu and Zhou 2009, I take my question back. Particularly, as a low sampling rate is recommended, it very likely makes the retraining also in a big data context much easier.