Demystifying Numenta Anomaly Benchmark

Title: Demystifying Numenta Anomaly Benchmark Authors: Nidhi Singh and Craig Olinsky, Intel Security, Germany Paper: https://ieeexplore.ieee.org/document/7966038Authors Tags: [data mining][outlier detection] [Data Stream]

The authors provide an in-depth analysis of the key aspects of the NAB framework, and highlight inherent challenges therein, with the objective to provide insights about the gaps in the current framework that must be addressed so as to make it more robust and easyto-use.

They also provide additional evaluation of five state-of-the-art anomaly detection algorithms (including the ones proposed by Numenta) using the NAB datasets, and based on the evaluation results, they argue that the performance of these algorithms is not sufficient for practical, industry-scale applications, and must be improved upon so as to make them suitable for large-scale anomaly detection problems.

Main contributions:

1) The authors provides in-depth analysis of the NAB scoring system and identify important gaps therein. 2) They illustrate the challenges that are faced in handling NAB datasets, which hinders the predictive performance of conventional time-series based or machine learning based models on these datasets. 3) We also perform comprehensive evaluation of five state-of-the-art anomaly detection algorithms (that are specified in the NAB framework) using standard metrics like precision, recall and false positive rate, which reveals another perspective of the predictive performance of these algorithms than is shown using the NAB scoring system in the framework.

A. Challenges in the NAB Scoring System

Determining anomaly window’s size- The NAB scoring system is based on anomaly windows, but there is no systematic way of determining the optimal size of anomaly windows.
Gaps in scoring function- the equation is not well-defined

B. Challenges in the NAB Datasets 1) Missing values in datasets 2) Difference in data distribution

The authors demonstrated valid short comings of the NAB framework (i.e missing values in datasets) but didn't consider the assumption and definition of the type of data distribution addressed by NAB. Therefore I think the 2 challenge of datasets distribution is not valid.

tsukuba-kde / papers

Demystifying Numenta Anomaly Benchmark #12