Unsupervised online detection and prediction of outliers in streams of sensor data

• Paper title Unsupervised online detection and prediction of outliers in streams of sensor data • Authors/Affiliations Niko Reunanen · Tomi Räty · Juho J. Jokinen · Tyler Hoyt · David Culler - VTT Technical Research Centre of Finland, Kaitoväylä 1, 90571 Oulu, Finland - Department of Electrical Engineering and Computer Sciences, University of California, Berkeley • Paper https://link.springer.com/article/10.1007/s41060-019-00191-3 • Tags [ Outlier detection ][ Outlier prediction][Data streams][Machine learning][Unsupervised learning] • What is it? This article proposes novel methods for outlier detection and outlier prediction in streams of sensor data. The outlier detection is an independent, unsupervised process, which is implemented using an autoencoder. The outlier detection continuously evaluates if the latest data point xi from a stream is an inlier or an outlier. This distinction is based on the reconstruction cost accompanied with Chebyshev’s inequality and the EWMA (exponentially weighted moving average) model. The outlier prediction uses the results of the outlier detection to form the required training data. The outlier prediction utilizes LR (logistic regression), SGD (stochastic gradient descent) and the hidden representation provided by the autoencoder to predict outliers in streams. • How is it great compared to the related works? --- The approach uses a small amount of memory, because the data are processed for a data point at a time. ---The method does not need separate training data, which are labeled in advance. --- the proposed model is computationally more efficient than the compared methods (COD, MCOD and STORM2), because the method does not utilize a sliding window in outlier detection. • The key technical differentiators --- model calibration instead of training --- they used the hidden representation provided by the autoencoder to predict outliers in streams --- The proposed approach uses an unsupervised method for detecting outliers in streams. The detected outliers are then used as labels for the prediction algorithm • How did they validate the advantages? ---The evaluation consists of two phases. The outlier detection is experimented and evaluated separately in the first phase. In the second phase, the integration of the outlier detection and outlier prediction is experimented and evaluated. ---The outlier detection is evaluated using synthetic sensor data and real-world sensor data. The real-world sensor data consists of readings of electric sensors from east passenger elevator in Cory Hall at University of California, Berkeley. • Are there any discussions around the proposal? ---* The paper almost achieved all criteria for anomalies/outliers detection in data stream

Predictions must be made online; i.e., the algorithm must identify state Xt as normal or anomalous before receiving the subsequent Xt+1.--->Used
The algorithm must learn continuously without a requirement to store the entire stream.---> Used
The algorithm must run in an unsupervised, automated fashion—i.e., without data labels or manual parameter tweaking.---> Used
Algorithms must adapt to dynamic environments and concept drift, as the underlying statistics of the data stream is often non-stationary.----> It does underperform with this criteria by labeling some inliers as outliers verse versa
Algorithms should make anomaly detections as early as possible.---> Not as early as other algo ( i.e. Anomaly detection using HTM)
Algorithms should minimize false positives and false negatives (this is true for batch scenarios as well).---> it did

NOTE: this paper is better than the previous paper I read (w.r.t. anomalies/outlier detection criteria)

NEXT PAPER: Unsupervised real-time anomaly detection for streaming data, DOI link: https://doi.org/10.1016/J.NEUCOM.2017.04.070

tsukuba-kde / papers

Unsupervised online detection and prediction of outliers in streams of sensor data #7