sahandha / eif

Extended Isolation Forest for Anomaly Detection
Other
445 stars 117 forks source link

Scoring takes too long #11

Closed thedarklord310780 closed 4 years ago

thedarklord310780 commented 4 years ago

My training and validation data are of similar size (about 1,500,000 rows and 11 features). Model building took very less time even with full extension. But, when scoring the validation data using compute_paths, the function has been running for close to 15 hours and still scoring is not done. Is there some way to speed up the scoring process?

mgckind commented 4 years ago

Hi, as a matter of fact we are working on a C++ implementation with a python wrapper which is identical to the current one, https://github.com/sahandha/eif/tree/cxx, this is much much faster, we are also adding parallelism to it,

You can get the code from the cxx branch which is working but hasn't been finished yet to be added to master

thedarklord310780 commented 4 years ago

Thanks, that might be very helpful.

The scoring took so much time because when I dumped the model using pickle, the file was corrupted in some way. Is there any other way to save the model?