[23] A Zero-/Few-Shot Anomaly Classification and Segmentation Method for CVPR 2023 VAND Workshop Challenge

Abstract

본 연구는 이상 탐지를 위해 extra linear layer를 추가해서 CLIP 모델을 활용하는 솔루션을 제안한다.

Introduction

State/template ensembles를 통해 text prompts를 만든다.
Abnormal region localization을 위해 extra linear layer를 학습해서 image features를 text features가 위치한 linear space상에 매핑시킨다.
Few-shot case의 경우, zero-shot phase에서의 extra linear layer를 retaion해서 weight을 그대로 쓴다. 추가로 이미지 인코더를 서서 reference image로부터 feature를 뽑고 메모리 벵크에 저장해서 test image들과 비교한다.
Shallow / deep features의 장점을 온전히 가져가기 위해서 zero/few shot 셋팅에서 모두 다른 stage의 features를 활용한다.

Method

2.1 Zero-shot AD setting

Anomaly classification
- text prompt ensemble strategy: state-level에는 excessively detailed description 말고 generic text를 적용했다. (ex. flawless, damaged 등) 그리고 template-level에서는 ImageNet CLIP에서 쓴 85개의 template 중에서 이상탐지에 적합하지 않은 템플릿은 제거해서 사용했다. text encoder에서의 지막 층에서의 text features를 평균내고 이미지 feature와 곱해서 이상 확률값을 얻는다.
Anomaly Segmentation
- 이미지랑 text feature 간의 유사도를 계산해서 anomaly map을 만드는 것이 일반적인데, CLIP 모델은 분류를 이해 디자인되었기 때문에 이미지 feature가 joint space 상에 잘 매핑되지 않는다. 즉, space 상에서 text feature와 직접적으로 비교할 수 없는 것이다.
- 따라서 본 논문에서는 이 이미지 feature들을 joint embedding space상에 mapping시키기 위해서 별도의 linear layer를 학습해서 text features와 비교할 수 있는 방법론을 제안한다.
- 이미지 인코더의 얕은 층과 깊은 층의 distinct features를 활용한다.
- ViT에서 모든 층을 4 stage로 나누고 각 stage에서 linear layer을 두어서 output feature를 joint embedding space로 매핑한다. (2)에서 F는 패치 토큰 feature를 의미하고 k,b는 각각 해당하는 linear layer의 weight,bias를 의미한다. stage별로 anomaly map을 얻고 나서 모든 stage를 다 더한다.
Losses linear layers를 학습시킬 때 linear combination of focal loss, dice loss로 supervise해서 anomaly map을 예측함

2.2 Few-shot AD setting

Anomaly Classification
1. zero-shot setting과 마찬가지로 text prompt로부터 guied한 anomaly score
2. maximum value of the anomaly map 이 두 파트를 모두 최종적인 anomaly score로 썼다.
Anomaly Segmentation
Training: CLIP 이미지 인코더에 refenece image(few개)를 feed해서 feature를 뽑아서 stage 별 메모리뱅크에 넣는다. 이 때 인코더에서의 multi-layer feature를 저장한다.
Test: test이미지도 이미지 인코더에 태우고 각 stage 별로 메모리뱅크 reference feuatres들과 cosine similarity를 구해 anomaly map을 구한다.

그다음 모든 anomaly map을 합친다. 이렇게 만들어진 식 4에서의 anomaly map은 zero-shot 에서 구한 anomaly map이랑 합쳐진다. 여기서 중요한 건, few-shot의 경우 reference image를 가지고 linear layer를 fine-tuning하지 않고 zero-shot 셋팅에서 얻은 weights를 그대로 활용한다는 점이다.

sy00n / DL_paper_review

[23] A Zero-/Few-Shot Anomaly Classification and Segmentation Method for CVPR 2023 VAND Workshop Challenge #27

Abstract

Introduction

Method

2.1 Zero-shot AD setting

2.2 Few-shot AD setting