Closed ETTAN93 closed 6 months ago
Hi @ETTAN93,
At the moment, Darts does not offer any unsupervised models for anomaly detection but it can be added to the roadmap, especially if contributors propose architectures and open PRs.
@madtoinou thanks for that.
Another thing to clarify, I used eval_accuracy in darts as part of the quantile detector class and compared it to the results I got from sklearn's recall_score.
I passed the same y_test and y_pred series to both:
from sklearn.metrics import recall_score
qd_recall = qd.eval_accuracy(y_test_series, qd_y_pred_series, metric='recall')
sklearn_recall = recall_score(y_test_series.pd_series(), qd_y_pred_series.pd_series())
For some reason, I am getting the inverse of values from both, i.e. when I sum the two recall scores, I end up with 1.0. In this particular case, qd_recall from darts returns me 0.9946808510638298 whereas the recall_score from sklearn returns me 0.005319148936170213.
Am I passing in the wrong parameters to the darts function? As far as I understand, the anomaly_score parameter should be the y_pred_series from the model? what does the window parameter do?
The same also happens when I evaluate the accuracy score. The two scores that are returned sums up to 1.
from sklearn.metrics import accuracy_score
qd_accuracy= qd.eval_accuracy(y_test_series, qd_y_pred_series, metric='accuracy')
sklearn_accuracy = accuracy_score(y_test_series.pd_series(), qd_y_pred_series.pd_series())
Hi @ETTAN93, QuantileDetector.eval_accuracy()
expects the predicted scores from the Scorer and not the output of QuantileDetector.detect()
.
The following should work:
# darts
qd = QuantileDetector(high_quantile=0.5)
anom_pred = qd.fit_detect(scores_pred)
qd_recall = qd.eval_accuracy(anom_true, scores_pred, metric="recall")
# sklearn
sl_recall = recall_score(
anom_true.slice_intersect(anom_pred).pd_series(),
anom_pred.slice_intersect(anom_true).pd_series()
)
print(qd_recall, sl_recall)
outputs: (0.6923, 0.6923)
You could also use eval_accuracy_from_binary_prediction()
from
darts.ad.utilsto compute the recall on the output of the
QuantileDetector`.
Note also that in 1-2 weeks we'll release the new Darts version with the refactored anomaly detection module (including an example notebook). So the API will change slightly (see the changes and PR here).
@dennisbader how is the scores_pred
defined?
It can be any numeric non-binary input series. The detector converts non-binary to binary.
In the example above it was the output from KMeansScorer.score()
. But you can also use it on other series as shown below:
from sklearn.metrics import recall_score
from darts import TimeSeries
from darts.ad import QuantileDetector
from darts.datasets import AirPassengersDataset
series = AirPassengersDataset().load()
# flag values above 400 as anomalies
anom_true = TimeSeries.from_dataframe(
series.pd_dataframe() > 400
)
# darts
qd = QuantileDetector(high_quantile=0.95)
anom_pred = qd.fit_detect(series)
qd_recall = qd.eval_accuracy(anom_true, series, metric="recall")
# sklearn
sl_recall = recall_score(
anom_true.slice_intersect(anom_pred).pd_series(),
anom_pred.slice_intersect(anom_true).pd_series()
)
print(qd_recall, sl_recall)
gives (0.2857, 0.2857)
Based on the darts documentation on anomaly models, it seems like the 2 available ones - filtering anomaly model and forecasting anomaly model both require the model to be initially fitted to a series without anomalies, i.e. a supervised anomaly detection model.
Is my understanding correct? Does Darts offer any unsupervised models for anomaly detection?