[20] Forward Compatible Few-Shot Class-Incremental Learning

PROSER 저자와 동일한 저자가 작성한 논문.

Abstract

we suggest learning prospectively to prepare for future updates, and propose ForwArd Compatible Training (FACT) for FSCIL
we seek to realize it by reserving embedding space for future new classes
we assign virtual prototypes to squeeze the embedding of known classes and reserve for new ones.
we forecast possible new classes and prepare for the updating process

Introduction

fig1 처럼 모델은 old class에 대한 discriminability를 해치지 않으면서 적은 수의 새로운 데이터로 통합할 수 있어야 한다. forgetting ㅔproblem과 별개로 few-shot instances에 overfitting 되는 문제도 있다. 이 논문에서는 모델이 forgetting 문제를 극복하는 능력을 backward compatibility라고 표현하고 있다. 그래서 만약 업데이트 된 모델이 이전의 class들도 잊지 않고 잘 분류한다면 compatibility가 있다고 말할 수 있다.
현재 방법론들은 주로 backward compatibility에 집중해서 이후 모델의 forgetting을 막는 것에 집중하고 있지만, 사실 이전 모델이 잘 작동하지 않으면 이후 모델의 성능이 떨어질 것이다. 따라서 더 나은 방식은 이전 모델에 대해 future extension을 고려해서 발전시키는 것이다. 이러한 compatibility를 forward compatibility라고 칭하고 있다. ( 즉 요약하면 base session에서부터 feature extractor 단에서 성능을 개선하는 것의 중요성을 주장하고 있다. )

본 논문에서 제안하는 FACT는 future class를 준비하는 모델이다.
모델을 growable하게 만들기 위해 임베딩 공간 상에서 multiple virtual prototypes를 pre-assign한다.
이 virtual prototypes를 최적화함으로써 같은 클래스끼리는 더 가까이 밀고 new class를 위한 공간의 여지를 더 많이 확보해줄 수 있다.
instance mixture 방식으로 virtual instances를 생성함으로써 explicit supervision으로 embedding space를 reserve할 수 있다.

Method

이 식에서 mask의 의미가 ground-truth logit을 지우고 나머지 파트들을 수도 라벨 y^과 매칭시키는 방식이다. 즉, PROSER 방식과 동일한데 표현만 달리 한 것. 식 5에서 f(x)는 l2 normalized cosine similarity를 의미한다. v는 virtual class수를 의미한다. 이 loss를 optimize 함으로써 모든 non-target class 들은 reserved virtual prototype으로 push된다. 그래서 other class에 대한 임베딩이 더 콤팩트해지고, virtual class는 reserved 될 것이다. 따라서 모델이 더 growable 하고 forward compatibility가 강화된다. (virtual class 수는 V=NB를 디폴트로 설정함)

Forecasting Virtual Instances

모델의 future-proof ability를 위해 이전 스테이지에서 novel patterin을 본 적이 있으면 앞으로의 스텝에서의 new class에 더 적합해 질 것이다. 따라서 이를 위해 instance mixutre를 통해 new class를 만들어서 이 생성된 이미지로 embedding space를 확보한다.
두 개의 다른 클러스터로부터 interpolation하는 방법이 종종 low-confidence prediction을 한다는 점에서 영감을 받아서 본 논문에서는 manifold mixup을 통해서 인스턴스를 fuse하고 중간 층에서의 임베딩을 두 개의 파트로 decouple한다.

sy00n / DL_paper_review