vc1492a / PyNomaly

Anomaly detection using LoOP: Local Outlier Probabilities, a local density based outlier detection method providing an outlier score in the range of [0,1].
Other
305 stars 36 forks source link

Implementation Speed #11

Closed ayushgupt closed 6 years ago

ayushgupt commented 6 years ago

I am running the Loop algorithm on unclustered 37k two-dimensional points. It's taking forever to run. Is it because of implementation or the algorithm is inherently slow?

vc1492a commented 6 years ago

@ayushgupt thanks for opening the issue. The approach is a nearest neighbor approach and thus inherently slow, especially with a large number of observations. With 37k observations, you can definitely expect it to take a while to return scores. While the implementation could probably be improved in regards to speed, the LoOP approach is computationally expensive with a large number of observations.

I'm not sure what you're data looks like, but one option is to use Hamlet et. al.'s modified implementation of LoOP that is included in this package. Their approach allows one to fit LoOP on "training" data, and then score incoming observations against the original fit. It's not as accurate as fitting LoOP outright to all data, but should help in regards to speed if that is what you're looking for.

Hope this helps. Feel free to comment further, but I'll be closing the issue as the issue you mentioned above is an inherent trait of the algorithm and approach and not of the implementation.