[36] Zero-Shot Anomaly Detection via Batch Normalization

Abstract

normal training data에 drift가 생기는 상황, 특히 new normal에 대한 training data가 없는 상황을 위해 zero-shot AD setting이 개발되어 오고 있음.
off-the-shelf deep anomaly detectors(such as deep SVDD)를 adapt시켜서 inter-related training data distributions in combination with batch normalization을 통해 unseen AD에 대한 zero-shot generalization 성능을 확보함.

Introduction

We propose Adaptive Centered Representations (ACR), a lightweight zero-shot AD method that combines two simple ideas: batch normalization and meta-training.
- 가정: "normal" sample의 majority를 가정하면서 randomly-sampled batch에는 abnormalities보다 normalities가 더 많을 것이다.
- Batch normalization을 수행함으로써 이 normal sample을 center로 가도록, abnormal sample은 바깥쪽으로 가도록 함.
- 이런 scaling and centering 방식은 input의 domain shift에 강건하기 때문에 self-supervised anomaly detector가 학습 시 보지 못한 distribution을 만나더라도 generalize 되게 해준다.
- meta-training scheme도 제안함.
- 이런 ACR 방식의 장점은 theoretically grounded, simple, domain-independent, and compatible with various backbone model 라고 주장함.

Method

adaptive batch-level AD를 통해 1) normal과 abnormal 간의 discriminations를 가능하게 하고 2) common frame of reference를 통해 서로 다른 분포로부터 데이터를 가져오기 때문에 unseen distribution에 generalize할 수 있다.

fig1 처럼 학습때 보지 못했던 geese를 lion 사이에서 찾아내는 것이다. DSVDD를 통해 임베딩 스페이스 상에서 pre-specified point로 매핑하고 그 포인터로부터의 거리를 기반으로 scoring한다.

주로 zero-shot learning이나 meta-learning에서 쓰이는 interrelated data ditribution으로 meta-training set을 구성한다.
이 때 inter-relatedness는 k개의 training distribution이 있고 meta distribution으로부터 *개의 test distribution을 샘플링해옴. 이렇게 배치 샘플을 conditioning 함으로써 context에 대한 reference를 반영한 anomaly scoring이 가능해지며 distributional information을 반영할 수 있음. (배치 내 context 상에서 고양이가 정상일 수 있지만 강아지 이미지 배치 사이에서는 이상일 수 있음)

그럼 어떻게 batch-level information을 anomaly detection에 conjuction할 수 있을까?

몇 가지 가정이 필요함

항상 meta-training set에 available 하다고 가정 (meta-set으로 re-training없이 adapt가 가능)
Batch-level anomaly detection (test 시에 batch-level prediction이 이루어짐.)
Majority of normal data (모든 i.i.d sampled test batch 마다 정상 데이터가 주를 이룬다고 가정) Test 시에는 abnormaly labels가 없기 때문에 가정 2,3 없이는 정확한 inference가 불가능함.

Adaptively Centered Representations

Batch Normalization as Adaptation module
- parameter-free zero-shot batch-level anomaly detector로써 batch normalization을 적용 (평군 0, 분산 1로)
- 평균은 가정 3에 의해 대부분의 배치에서 정상 데이터가 훨씬 많다. 만약 x가 informative한 feature space 상에 있다면 이상은 평균과 비교해서 더 먼 거리에 있게 되는 간단한 방식임.
- 하지만 사실 일반적으로 정상 샘플은 original data에서 평균 주변에 집중되지 않는다고 함.
- 따라서 이 아이디어를 DNN에 통합해서 zero-shot adaptively centered representation을 학습하는 법을 배움.

Training Objective

학습할 때 batchnorm으로 optimization convergence를 수행함.
최적화하는 loss가 anomaly score 자체이다. (배치에 대한 평균)
일반적으로 loss는 DSVDD, neural transformation learning (NTL)이 있음.
이게 가능한 이유는, batch normalization은 서로 다른 분포의 데이터 배치를 re-calibrate 하는 데에 도움이 된다. 정상 데이터는 origin 주변에 몰리게 됨.
이러한 calibration은 low-level에서 ligh-level features로 이어지면서 powerful feature learning, adaptation ability를 갖게됨.
따라서 eq.5를 optimize하게 되면 모든 k개의 different training distributions에 adapt 될 수 있고, unseen에 generalize 될 수 있다.

Meta Outlier Exposure

eq5에서 더 개선하기 위해 meta-data에 속하지 않는 다른 분포로부터 데이터를 뽑아와서 씀.
synthetic anomalies가 학습 시 normal data에 대한 tighter decision boundary를 만드는 데 쓰일 수 있다.
각각의 P 분포에 대해 mixture distribution 수행
admixed anomalies를 위한 additional loss 구성함
- 일반적으로 많은 anomaly scoring은 inversely하게 score를 구성해서 정상 샘플에 대해서는 크고, 이상치에서는 작을 것
- 하지만, 이 때 두 점수가 동일한 파라미터를 공유함.
- DSVDD에서는 1/A로 정의하지만, 여기서는 binary indicator variable y_ij를 정의함(i는 정상, 이상 여부를 의미하고 j는 P_j 즉 어떤 분포인지를 의미함)

위 loss는 outlier exposure loss와 유사하지만 이미지 단에서의 generated sample이 아니라 학습 시에 나온 분포로 pseudo anomaly 만듦.

Batch-level Prediction

inference시에는 batch-level prediction이 수행됨. 이때도 마찬가지로 대부분의 샘플이 정상이라고 가정하고 anomaly score에 thresholding 함. 시간복잡도는 딥러닝 프레임워크가 대부분 병렬처리가 가능하다보니 배치사이즈랑 연관되어 O(1)

meta outlier exposure 실험이랑 tabular 데이터 실험, 다른 nomalization 실험과의 비교 등은 다 supp에 있음.

sy00n / DL_paper_review

[36] Zero-Shot Anomaly Detection via Batch Normalization #41