sahandha / eif

Extended Isolation Forest for Anomaly Detection
Other
448 stars 119 forks source link

Model saving #18

Open adavoli91 opened 4 years ago

adavoli91 commented 4 years ago

Hi, is it possible to save a model, e.g. with pickle? Thanks

mgckind commented 4 years ago

Hi @adavoli91, On the python version of eif can be pickled, for the cython and faster version we need to implement that. Thanks for letting us know, we'll try to add that in the future.

adavoli91 commented 4 years ago

Thank you for the reply.

I managed to pickle eif_old, but I noticed that the output file is very large (~100MB for a dataset with ~500 rows and ~60 columns); the same dataset, processed with IF from sklearn, gives a pickle file of less than 1MB. Is that expected, or can it be handled?

Thans

mgckind commented 4 years ago

Thanks,

I think that;s expected since the eif_old is purely python while sklearn is C-based. The ideal scenario would be to add pickle to the cython class which requires some little development.

lpryszcz commented 4 years ago

Hi @mgckind , I back this feature request. I found eIF super useful and the only feature I'm lacking currently is model saving. Have you had any opportunity to work on that?

lpryszcz commented 4 years ago

I've rewritten Python version as eif_new.py. This version has performance matching C++ (~40x faster than eif_old.py) and allow model saving with model files 10x smaller than eif_old.py #24