[19] Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification & Segmentation

링크 : https://openaccess.thecvf.com/content/CVPR2023/papers/Kang_Distilling_Self-Supervised_Vision_Transformers_for_Weakly-Supervised_Few-Shot_Classification__Segmentation_CVPR_2023_paper.pdf

Abstract

본 논문은 self-supervised pretrained ViT를 활용한 weakly-supervised few-shot image classification & segmentation task를 다룬다.
Self-supervsed ViT의 token representation의 self-attention을 활용해서 correlation을 반영하고 각각 2개 head를 두어서 classification과 segmentation을 수행한다.
Pixel-level label 없이 오직 image-level label만 가지고도 classification과 segmentation을 수행할 수 있는 방식이다.
- 이를 위해 pretrained ViT backbone에서의 token에 해당하는 attention map을 사용해서 pseudo-label로 사용한다.
더 practical setup인 "mixed" supervision을 사용해서 groud truth가 존재하는 작은 수의 training 이미지와 나머지 only image-level label만 가지는 경우를 다룬다.
- Mixed setup에서는 pseudo-label enhancer(학습하는 방식)를 사용해서 pseudo-label을 개선한다.

Introduction

본 연구에서는 few-shot classification과 image segmentation을 통합하는 것에 초점을 둠. 그동안 conventional weakly supervised learning은 많이 연구되어 왔지만, weakly supervised few-shot learning은 잘 연구되어 오지 않은 이유는 train시와 test 시의object class가 다르니까 train시의 class에만 과적합 되는 경향이 있기 때문이다. (few shot이라 더더욱 치명적일 수 밖에 없음.)
따라서 본 연구에서는 image-level labels(class labels)에만 접근 가능한 weakly supervised FS-CS 시나리오를 다룬다.

슬라이드3 슬라이드4 슬라이드5 슬라이드6 슬라이드7 슬라이드8 슬라이드9 슬라이드10 슬라이드11 슬라이드12 슬라이드13 슬라이드14 슬라이드15 슬라이드16 슬라이드17 슬라이드18 슬라이드19 슬라이드20 슬라이드21 슬라이드22 슬라이드23 슬라이드24 슬라이드25 슬라이드26 슬라이드27 슬라이드28 슬라이드29 슬라이드30 슬라이드31

sy00n / DL_paper_review

[19] Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification & Segmentation #22

Abstract

Introduction