unit8co / darts

A python library for user-friendly forecasting and anomaly detection on time series.
https://unit8co.github.io/darts/
Apache License 2.0
8.11k stars 884 forks source link

Does Darts provide methods for unsupervised anomaly detection models? #2355

Closed ETTAN93 closed 6 months ago

ETTAN93 commented 7 months ago

Based on the darts documentation on anomaly models, it seems like the 2 available ones - filtering anomaly model and forecasting anomaly model both require the model to be initially fitted to a series without anomalies, i.e. a supervised anomaly detection model.

Is my understanding correct? Does Darts offer any unsupervised models for anomaly detection?

madtoinou commented 6 months ago

Hi @ETTAN93,

At the moment, Darts does not offer any unsupervised models for anomaly detection but it can be added to the roadmap, especially if contributors propose architectures and open PRs.

ETTAN93 commented 6 months ago

@madtoinou thanks for that.

Another thing to clarify, I used eval_accuracy in darts as part of the quantile detector class and compared it to the results I got from sklearn's recall_score.

I passed the same y_test and y_pred series to both:

from sklearn.metrics import recall_score
qd_recall = qd.eval_accuracy(y_test_series, qd_y_pred_series, metric='recall')
sklearn_recall = recall_score(y_test_series.pd_series(), qd_y_pred_series.pd_series())

For some reason, I am getting the inverse of values from both, i.e. when I sum the two recall scores, I end up with 1.0. In this particular case, qd_recall from darts returns me 0.9946808510638298 whereas the recall_score from sklearn returns me 0.005319148936170213.

Am I passing in the wrong parameters to the darts function? As far as I understand, the anomaly_score parameter should be the y_pred_series from the model? what does the window parameter do?

image

The same also happens when I evaluate the accuracy score. The two scores that are returned sums up to 1.

from sklearn.metrics import accuracy_score
qd_accuracy= qd.eval_accuracy(y_test_series, qd_y_pred_series, metric='accuracy')
sklearn_accuracy = accuracy_score(y_test_series.pd_series(), qd_y_pred_series.pd_series())
dennisbader commented 6 months ago

Hi @ETTAN93, QuantileDetector.eval_accuracy() expects the predicted scores from the Scorer and not the output of QuantileDetector.detect().

The following should work:

# darts
qd = QuantileDetector(high_quantile=0.5)
anom_pred = qd.fit_detect(scores_pred)
qd_recall = qd.eval_accuracy(anom_true, scores_pred, metric="recall")

# sklearn
sl_recall = recall_score(
    anom_true.slice_intersect(anom_pred).pd_series(), 
    anom_pred.slice_intersect(anom_true).pd_series()
)

print(qd_recall, sl_recall)

outputs: (0.6923, 0.6923)

You could also use eval_accuracy_from_binary_prediction() fromdarts.ad.utilsto compute the recall on the output of theQuantileDetector`.

Note also that in 1-2 weeks we'll release the new Darts version with the refactored anomaly detection module (including an example notebook). So the API will change slightly (see the changes and PR here).

ETTAN93 commented 6 months ago

@dennisbader how is the scores_preddefined?

dennisbader commented 6 months ago

It can be any numeric non-binary input series. The detector converts non-binary to binary. In the example above it was the output from KMeansScorer.score(). But you can also use it on other series as shown below:

from sklearn.metrics import recall_score

from darts import TimeSeries
from darts.ad import QuantileDetector
from darts.datasets import AirPassengersDataset

series = AirPassengersDataset().load()

# flag values above 400 as anomalies
anom_true = TimeSeries.from_dataframe(
    series.pd_dataframe() > 400
)

# darts
qd = QuantileDetector(high_quantile=0.95)
anom_pred = qd.fit_detect(series)
qd_recall = qd.eval_accuracy(anom_true, series, metric="recall")

# sklearn
sl_recall = recall_score(
    anom_true.slice_intersect(anom_pred).pd_series(),
    anom_pred.slice_intersect(anom_true).pd_series()
)

print(qd_recall, sl_recall)

gives (0.2857, 0.2857)