kLabUM / rrcf

🌲 Implementation of the Robust Random Cut Forest algorithm for anomaly detection on streams
https://klabum.github.io/rrcf/
MIT License
488 stars 111 forks source link

Unable to copy/save model using pickle #69

Closed colinkyle closed 4 years ago

colinkyle commented 4 years ago

I'm using the model in a streaming anomaly detection scenario where I want to generate the trees up to a certain point in time, then repeatedly advance the models from that starting point on various predicted time-series.

However, the method I came up with was to "train" then save or copy the model, then run the copied version on the new time-series.

However, it looks like the trees can't be pickled which is causing copy and save issues:

X = np.random.randn(100, 2)
tree = rrcf.RCTree(X)
copy.deepcopy(tree)

TypeError: can't pickle module objects

This would seem to indicate that somewhere in the RCTree class, the instances are referencing a module rather than an instance of a module. Is there anyway to address this? Either by using instances rather than modules or perhaps just an alternative way to copy/save the tree classes?

colinkyle commented 4 years ago

Ah I see that you've provided to_dict and from_dict methods to address this issue. I didn't catch this at first due to the out of date pypi release. Fixed with: pip3 install git+https://github.com/kLabUM/rrcf --upgrade

mdbartos commented 4 years ago

Thanks for reminding me. I'll push to pypi as soon as I get a chance.

FYI, this thread shows how you can pickle a tree as well: https://github.com/kLabUM/rrcf/issues/65

colinkyle commented 4 years ago

Thanks!