Closed ruizcrp closed 7 months ago
After reading the original isolation forest paper by Liu and Zhou 2009, I take my question back. Particularly, as a low sampling rate is recommended, it very likely makes the retraining also in a big data context much easier.
Hi, I was just looking for possibilities to further train a previous model with additional/new data. This is quite relevant in the big data field as it would otherwise require to keep a large amount of data to retrain the model every time from scratch.
In the following article such a possibility is given with isolation forest using sklearn: https://medium.com/grabngoinfo/isolation-forest-for-anomaly-detection-cd7871ae99b4
Are you planning to implement something similar for your spark-solution of isolation forest?
Kind regards and thank you for this great library!