MoCo는 queue 기반의 dictionary를 구축하여 large-size negative pair를 만들어 주고 momentum update을 이용해 key representation의 consistency를 최대화 하여 contrastive learning 성능을 높인 모델이다.
Abstract
We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic dictionary with a queue and a moving-averaged encoder. This enables building a large and consistent dicionary on-the-fly that facilitates contrastive unsupervised learning. MoCo provides competitive results under the common linear protocol on ImageNet classification. More importantly, the representations learned by MoCo transfer well to downstream tasks. MoCo can outperform its supervised pre-training counterpart in 7 detection/segmentation tasks on PASCAL VOC, COCO, and other datasets, some- times surpassing it by large margins. This suggests that the gap between unsupervised and supervised representation learning has been largely closed in many vision tasks.
Keywords
Contrastive Learning, Dynamic Dictionary, Momentum Update, InfoNCE Loss, Instance Discrimination Task
TL;DR
MoCo는 queue 기반의 dictionary를 구축하여 large-size negative pair를 만들어 주고 momentum update을 이용해 key representation의 consistency를 최대화 하여 contrastive learning 성능을 높인 모델이다.
Abstract
We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic dictionary with a queue and a moving-averaged encoder. This enables building a large and consistent dicionary on-the-fly that facilitates contrastive unsupervised learning. MoCo provides competitive results under the common linear protocol on ImageNet classification. More importantly, the representations learned by MoCo transfer well to downstream tasks. MoCo can outperform its supervised pre-training counterpart in 7 detection/segmentation tasks on PASCAL VOC, COCO, and other datasets, some- times surpassing it by large margins. This suggests that the gap between unsupervised and supervised representation learning has been largely closed in many vision tasks.
Paper link
http://openaccess.thecvf.com/content_CVPR_2020/html/He_Momentum_Contrast_for_Unsupervised_Visual_Representation_Learning_CVPR_2020_paper.html
Presentation link
https://drive.google.com/file/d/1t6vWM37Walz2S2Hojche_RG_A8KvN1Mx/view?usp=sharing
video link
https://youtu.be/S-FsnPxAEFA