vc1492a / PyNomaly

Anomaly detection using LoOP: Local Outlier Probabilities, a local density based outlier detection method providing an outlier score in the range of [0,1].
Other
319 stars 37 forks source link

parallelize #36

Open maxcw opened 4 years ago

maxcw commented 4 years ago

It would be great if there's an option for embarrassingly parallel computations, especially if all N^2 distances are calculated.

vc1492a commented 4 years ago

@maxcw Thanks for opening the issue, I agree that it would be a nice option to provide parallelism as part of the available options for computation. I believe this is available via numba, the JIT-compilation library that's an option when using PyNomaly.

Since parallel computation is an option when using numba, it may be pretty straight-forward to try and test the following implementation, more specifically take this line:

https://github.com/vc1492a/PyNomaly/blob/744fa57fde27f369f6265ffe57ecb3db3d3374ea/PyNomaly/loop.py#L560

In pass the parameter parallel in the following way:

# parallel is some boolean parameter set earlier, e.g. 
parallel = True
compute = numba.jit(self._compute_distance_and_neighbor_matrix,
                            cache=True, parallel=parallel) if self.use_numba else \
            self._compute_distance_and_neighbor_matrix

I'll mark this as an enhancement to take a look at for a future release (or please feel free to try it yourself and submit a PR). Thanks!

vc1492a commented 4 years ago

Work on this issue can now be tracked in #43.

vc1492a commented 4 years ago

May be helpful to use a tracing tool like pyinstrument to gauge the effect of certain code changes.

vc1492a commented 2 years ago

Implemented in the branch feature/numba parallel but performance is not improved.