vc1492a / PyNomaly

Anomaly detection using LoOP: Local Outlier Probabilities, a local density based outlier detection method providing an outlier score in the range of [0,1].
Other
305 stars 36 forks source link

Predict the probability of testing data #45

Open zhaohan-xi opened 2 years ago

zhaohan-xi commented 2 years ago

Hi Dear Author, I wonder does this package contains an API to do independent testing after fitting? For instance, something like:

m = loop.LocalOutlierProbability(data).fit()

scores_of_test_data = m.local_outlier_probabilities(test_data)

where the "data" is used for training (fitting) and "test_data" is another np.array that is for testing only, by which we want to know whether the "test_data" is the outlier for training "data", while we don't put them together for fitting (because fitting again every time takes a long time).

Does this package have such an API?

vc1492a commented 2 years ago

Hi @HarrialX, good question.

The original Local Outlier Probability (LoOP) approach was never intended as anything but an unsupervised approach over existing data, meaning it was intended to be applied over an entire dataset, each time new data was observed.

However, in a separate section of readme.md, you will find instructions on how to use an alternative version of LoOP that was developed for this use case, when "new" data is observed and scores are desired for those observations. It's intended to be used with "streaming" data, but I think that approach (and this package / API) could work well in that case.

Just read the section of the readme.md about the streaming data and you should be good for your use case. Otherwise, I suggest an alternative outlier detection method.