Open lpryszcz opened 4 years ago
Is this still an active project?
That's a good question @wundermahn . If you want optimised Python version, you can get it directly from my fork.
Hi there, this would be the fix for my problem as well, would it? I am currently trying to pickle the isolationForest model and failing due to som Cython issue:
File "stringsource", line 2, in eif.iForest.__reduce_cython__
TypeError: no default __reduce__ due to non-trivial __cinit__
hi @psmgeelen , yes, you can't save models from Cython version. Try my fork - it has a performance similar to Cython version, but is implemented in Python (with Numba optimisations).
@lpryszcz , you are the best! I will get on it now! So I really only need the eif_new.py
file and that's it? Maybe it's worthwhile to have your version to be integrated in scikit. I recommended you anyhow https://github.com/scikit-learn/scikit-learn/issues/16517
EDIT: It works out of the box, I love the script! Small questions though, does it make sense to have a threshold that is always 0.5? Instead you could just push the values directly.
I'm glad it works for you :) And thanks for the recommendation @psmgeelen . I'd be more than happy to contribute to scikit-learn given there is interest from their side.
I've optimised Python version so it matches performance with C++ version and allow saving the models. There is runtime examle added to Notebooks/comparison_py_cxx.ipynb The code was rewritten entirely. Some functions are optimised with numba. The iForest is now a numpy array, which allow fast computation and model dump with low storage footprint.