vc1492a / PyNomaly

Anomaly detection using LoOP: Local Outlier Probabilities, a local density based outlier detection method providing an outlier score in the range of [0,1].
Other
319 stars 37 forks source link

Improved performance #33

Closed MichaelSchreier closed 5 years ago

MichaelSchreier commented 5 years ago

In order to increase the impact of numba's JIT all the heavy lifting from the _distances() method was moved into its own method that can be compiled far more efficiently.

As a result calls to fit() result in a runtime decrease of about 30 to more than 90(!) percent, in particular on repeated calls with larger datasets.

The "new" _compute_distance_and_neighbor_matrix() method is set to nopython=False only because test_data_format was behaving weirdly, which I couldn't resolve.

coveralls commented 5 years ago

Pull Request Test Coverage Report for Build 89


Changes Missing Coverage Covered Lines Changed/Added Lines %
PyNomaly/loop.py 24 28 85.71%
<!-- Total: 24 28 85.71% -->
Totals Coverage Status
Change from base Build 86: -1.3%
Covered Lines: 301
Relevant Lines: 305

💛 - Coveralls
MichaelSchreier commented 5 years ago

I've reduced the use of numba to the one function call where it is actually beneficial to the performance and made the use of it entirely optional. Unfortunately the conditional imports are reducing the coverage and I didn't find a way to prevent the import of numba in a test case.

vc1492a commented 5 years ago

@MichaelSchreier Thanks, this is great! I'll check out the PR and will include this in version 0.3.1 along with some other needed changes.