tsukuba-kde / papers

Paper introduction
2 stars 1 forks source link

Evaluating Real-time Anomaly Detection Algorithms – the Numenta Anomaly Benchmark #11

Open Gashongore opened 4 years ago

Gashongore commented 4 years ago

Title Evaluating Real-time Anomaly Detection Algorithms – the Numenta Anomaly Benchmark Authors Alexander Lavin, Subutai Ahmad, Numenta, Inc. Redwood City, CA Paper: http://arxiv.org/abs/1510.03336

The authors proposed the Numenta Anomaly Benchmark (NAB), which attempts to provide a controlled and repeatable environment of open-source tools to test and measure anomaly detection algorithms on streaming data. Tags: [data mining][outlier detection] [Data Stream]

NAB evaluates detectors on a benchmark on diverse time-series dataset It provides real-world labeled data from multiple domains NAB accepts other time-series data corpuses intended for real-time anomaly detection It includes some artificially-generated data files that test anomalous behaviors not yet represented in the corpus’s real data, as well as several data files without any anomalies.

A key element of the NAB dataset is the inclusion of real-world data with anomalies for which the authors know the causes. They proposed the NAB dataset as a quality collection of time-series data with labeled anomalies, and that it is well suited to be a standard benchmark for streaming applications.

They defined requirements of an ideal, real-world anomaly detector as follows: i. detects all anomalies present in the streaming data ii. detects anomalies as soon as possible, ideally before the anomaly becomes visible to a human iii. triggers no false alarms (no false positives) iv. works with real time data (no look ahead) v. is fully automated across all datasets (any data specific parameter tuning must be done online without human intervention) The NAB scoring algorithm aims to reward those characteristics.

The NAB anomaly score, in turn, has the following three components:

  1. Anomaly window - An anomaly window consists of a sequence of data points centered around one or more true anomalies in a dataset
  2. Application profile - NAB defines three different application profiles: standard, reward low false positives, and reward low false negatives.
  3. Scoring function - Given an anomaly window and an application profile, NAB uses sigmoidal scoring function to compute weight of each anomaly detection

The NAB dataset is composed of 58 labeled data files which span multiple domains like Twitter traffic, CPU utilization, New York taxi demand and temperature failure systems. These files are manually labeled following a well-documented procedure in Numenta, and hence contain reliable ground truth that can be used for robust evaluation of different anomaly detection algorithms.