[33] Prototypical Residual Networks for Anomaly Detection and Localization

supervised anomaly classification을 수행함. 이 때 data imbalance로 인해 오바피팅 되는 상황을 막기 위해 multiple instance learning으로 접근해서 다양한 스케일 patch size에 대해 prototype으로 cluster를 업데이트 하는 방식을 수행하고 다양한 augmentation 방식을 수행해서 대응함.
anomaly image와 그에 대한 GT mask도 써서 학습함. decoder를 직접적으로 두는 방식.

Prototype initialization
- ResNet feature map의 랜덤한 레이어에서 K개의 feature를 랜덤 샘플링하고, k-means clustering으로 업데이트 함
- L2 distance를 써서 두 feature map 간의 거리를 비교함
- 이 때 prototype의 개수는 데이터셋마다 정상 샘플의 수가 다 다르므로 normal sample 수의 특정 비율 만큼으로 설정함 (즉 데이터마다 프토로타입 개수 다름, 비율은 10%로 설정)
- Residual Representation anomalous residual representation은 가까운 prototype으로부터 정의됨. L2 norm으로 계산해서 가장 가까운 cluster prototype 찾음. 프로토타입은 scale마다 각각 따로 학습되므로 input sample은 각 다양한 scale의 prototype과 매칭될 수 있음.
Multi-scale Fusion
- multi-scale representation간의 학습을 위해 Multi-scale fusion blocks(MF)를 두어서 합침.
- Fig 4처럼 3개의 input image에 대한 transformed representation의 합임. 이 때 transform function은 input feature map index r과 output feature map의 index j따라 다름.
- r=j이면 입출력 feature map 동일
- r<j 이면 input feature map을 downsampling 한다. downsampling은 입력 이미지의 spatial resolution을 낮추는 즉 feature map의 크기를 줄이는 역할을 함. 이 때 depth-wise seperable convolution으로 수행됨
- r>j인 경우 input feature map을 upsampling 함.

patch를 다른 크기로 split하고 (2x2씩 줄여감) head마다 patch-wise self-attention을 수행함.
처음에 2cj × ps × ps 패치를 뽑고 1원 백터로 flatten 시킴. 그리고 fc layer을 학습시켜서 query , key, value embedding을 구함. 그
리고 원본 spatial resolution으로 resize 함.
최종적으로 이 feature들은 concat되고 2D residual block을 통과해서 최종적인 output을 구하게 됨.
MSA는 총 3개로 stack했음.

Extened anamalies(EA) : normal image에다가 seen anomaly로부터 augmented anomalous regions를 위치시키는 방식 (in-distribution anomalies)
Simulated anomalies(SA) : seen anomalies에 대한 지식 없이 normal sample로부터 만들기

이미지 전체에 대해 단순히 augmentation을 수행하는게 아니라 seen anomalies의 specific anomalous region을 augment해서 정상 이미지의 랜덤 위치에 붙이는 식.
augmentations (Fig. 5, Aug1) are applied to a randomly selected anomaly from the seen anomalies in order to generate color varieties.
{ equalize, solarize, posterize, sharpness, autocontrast, invert, gamma-contrast } 중에서 2개 랜덤선택하고 { rotate, shear, shift }중에서 랜덤하게 선택해서 위치도 정함.
여기서 좀더 현실적이게 하기 위해서 soft position constrain을 둬서 Target Areas를 어느정도 특정해줌.
랜덤 샘플링된 target areas로 anomaly의 실제 위치를 표현하는 binary mask로 만들어서 ground trutb mask를 확보함
Simulated Anomalies

DRAEM과 비슷하게, multiply Perlin nosie + DTD dataset으로부터의 랜덤 texture를 정상 이미지에 적용함.

decoder가 anomaly sore map을 만들면 focal loss랑 sooth L1 loss로 최적화함.

sy00n / DL_paper_review