sahandha / eif

Extended Isolation Forest for Anomaly Detection
Other
445 stars 117 forks source link

Optimised eif_new.py #24

Open lpryszcz opened 4 years ago

lpryszcz commented 4 years ago

I've optimised Python version so it matches performance with C++ version and allow saving the models. There is runtime examle added to Notebooks/comparison_py_cxx.ipynb The code was rewritten entirely. Some functions are optimised with numba. The iForest is now a numpy array, which allow fast computation and model dump with low storage footprint.

wundermahn commented 3 years ago

Is this still an active project?

lpryszcz commented 3 years ago

That's a good question @wundermahn . If you want optimised Python version, you can get it directly from my fork.

psmgeelen commented 2 years ago

Hi there, this would be the fix for my problem as well, would it? I am currently trying to pickle the isolationForest model and failing due to som Cython issue:

File "stringsource", line 2, in eif.iForest.__reduce_cython__
TypeError: no default __reduce__ due to non-trivial __cinit__
lpryszcz commented 2 years ago

hi @psmgeelen , yes, you can't save models from Cython version. Try my fork - it has a performance similar to Cython version, but is implemented in Python (with Numba optimisations).

psmgeelen commented 2 years ago

@lpryszcz , you are the best! I will get on it now! So I really only need the eif_new.py file and that's it? Maybe it's worthwhile to have your version to be integrated in scikit. I recommended you anyhow https://github.com/scikit-learn/scikit-learn/issues/16517

EDIT: It works out of the box, I love the script! Small questions though, does it make sense to have a threshold that is always 0.5? Instead you could just push the values directly.

lpryszcz commented 2 years ago

I'm glad it works for you :) And thanks for the recommendation @psmgeelen . I'd be more than happy to contribute to scikit-learn given there is interest from their side.