Unraveling the 'Anomaly' in Time Series Anomaly Detection: A Self-supervised Tri-domain Solution (ICDE 2024)

This paper addresses key challenges in time series anomaly detection (TSAD): (1) the scarcity of labels, and (2) the diversity in anomaly types and lengths. Furthermore, it contributes to the ongoing debate on the effectiveness of deep learning models in TSAD, highlighting the problems of flawed benchmarks and ill-posed evaluation metrics. This study stands out as the first to reassess the potential of deep learning in TSAD, employing both rigorously designed datasets (UCR Archive) and evaluation metrics (PA%K and affiliation). (paper)

Overview

Download the UCR dataset ready for use. Next, run preprocess_data.py. This script will partition 10% of the training data as the validation set and create a directory containing the dataset at ./dataset/ucr_data.pt in the following format:
```
{'train_data': train_x,
'valid_data': valid_x,
'test_data': test_x,
'test_labels': test_y}
```
Simply run train.py to train TriAD over the whole dataset. The results are saved as tri_res.pt (a demo version provided) and wrapped in a data frame.
To get a summary of both the tri-window and single window detection accuracy (among the 250 datasets, how many are successfully detected by tri/single window), simply run single_window_selection.py. The results will be saved as merlin_win.pt, which can generate the Merlin readable files by discord_data_prep.py. By restricting our focus to the single window, we force Merlin to scan around the window to find anomalies.
To get the summary of detection results of the shortest 62 datasets, simply run shortest_62.py.
The visualization of detection results and point-wise metrics are shown in the directory ./eval_demo. UCR 025 and UCR 150 are used as demo examples, the test_xxx.txt contains the Merlin search results, where the columns represent search_length, start_index, end_index, and discord_distance. Install the affiliation metrics, and run convert_pw.py:
```
python convert_pw.py 150
```
which will give the output as:
```
Dataset: UCR 150
window magic correction !!
UCR 150
Traditional Metrics:
  F1 Score: 0.3947

PA:
  F1 Score: 0.8619

PA%K - AUC:
  Precision: 0.5859
  Recall: 0.5442
  F1 Score: 0.5466

Affinity:
  Precision: 0.9922
  Recall: 0.9954
```
*Please note that the experimental outcomes might vary between runs due to the randomness introduced during the augmentation process.

The TSAD datasets

You can access several widely used TSAD datasets from Data Smith. Additionally, we offer a comprehensive visualization of them including the UCR dataset. The preprocessed version of the UCR dataset utilized in this study is available for direct download here.

The TSAD evaluation metrics

You may be also interested in this blog where we discuss about why the popular evaluation metric, point-adjustment (PA), can be a bit tricky. Additionally, we provide a detailed explanation, along with calculation examples, of the two reliable evaluation metrics used in this study.

pseudo-Skye / TriAD

readme

Unraveling the 'Anomaly' in Time Series Anomaly Detection: A Self-supervised Tri-domain Solution (ICDE 2024)

Overview

The TSAD datasets

The TSAD evaluation metrics