Deep anomaly detection with deviation networks

Paper title: Deep anomaly detection with deviation networks.

Authors: Guansong Pang (The University of Adelaide Adelaide), Chunhua Shen (The University of Adelaide Adelaide) & Anton van den Hengel (The University of Adelaide Adelaide).

Topic tags: [Anomaly Detection], [Deep Learning], [Representation Learning], [Neural Networks], [Outlier Detection]

landing page of paper

What is it? This paper focuses on the insufficiently explored area of deep anomaly detection. Primarily, most deep anomaly detection methods experience shortcomings due to; i) access to large-scale labelled anomaly data and ii) the dynamic nature of the anomalies themselves. Some unsupervised deep anomaly methods address these shortcomings by employing a two-step approach i.e., learning a new representation for the data and then based upon that, to define the anomaly score using a reconstruction error or a distance metric. GAN's and autoencoders make a good case for this technique. However, the representation learning and anomaly detection methods are separate. This can lead to representations that are suboptimal. Other works have tried to incorporate the anomaly scoring into the feature representation learning but optimize the representation using indirect optimization of anomaly scoring. Besides, these methods don't leverage prior knowledge of anomalies (which exist in real-world application domains) and as a result, tend to identify either a noisy signal or completely uninteresting data instances. A possible way to mitigate this is to incorporate prior knowledge of anomalies, which is the object of this work, in which a framework is proposed that learns a scoring function, whose job is to assign an anomalous score to a data object. The framework also using a prior to produce a reference score that helps guide the learning process. The score from the learning function, the reference score and a reference s.d (standard deviation) are given to a loss function defined as the "Z-score based dev loss". The loss function is designed to make anomalous scores of anomaly data to be far from the reference score $mu_R$. The framework is instantiated into a method called DevNets.
How is it great compared to the related works? As suggested in the previous paragraph, DevNets use knowledge of prior probabilities that help in guiding the learning process. They also employ a Z-score based loss that is more interpretable than using a score that is hardly interpretable which aid efficient data learning. Related works use methods such as REPEN and SVDD, however, they learn the new feature representation by optimising the anomaly score indirectly, leading the to suboptimal representations.
What are the key technical differentiators? Particularly, the DevNet architecture encompasses a way to interpret anomalous data using the probabilities and the Z-score. Ideally, one could also choose a threshold & confidence level of desired choice to determine anomalous data. These aspects make the interpretability of anomalies much easier than other methods where unification methods are used independently to do this, which questions the trustworthiness aspects.
How did they validate the advantages? In the experimentation of this work, DevNet is compared to four state of the art (SoTA) methods namely; REPEN, adaptive deep SVDD, prototypical networks (denoted as FSNet's) and an ensemble method called iForest. The methods were tested on 8 different datasets where the AUC-ROC and AUC-PR were used as evaluation metrics. As pointed out earlier that DevNets make use of prior knowledge of anomalies and as such the results show that DevNets outperformed the competing methods in most cases. In addition to this, an ablation study was conducted to understand the novel network. The original DevNet (denoted as Def) and 3 of its variants were trained and tested on the same data sets. The first variant called DevNEt-Rep removes the output layer of Def. The second variant called DevNet-linear removes the non-linear learning hidden layer in Def and learns the mapping from anomalous data to anomaly score directly. The last variant DevNet-3HL is used in which 3 hidden layers with 1000, 250 and 20 ReLu units are used. The findings show that Def was more stable than its variants achieving better average AUC-ROC and AUC-PR scores.
Are there any discussions around the proposal? Yes. Currently, work is in progress to incorporate DevNet into convolutional neural networks and recurrent neural nets to be used on image and sequence data for challenging real-world applications.
What are the next papers to read? Next paper to read will be related to data matching/entity resolution.

tsukuba-kde / papers

Deep anomaly detection with deviation networks #27

Paper title: Deep anomaly detection with deviation networks.