Supervised Contrastive Learning

一言でいうと

Supervised に、Sphere Embeddingを学習（phase1）。phase1後に、Linearな分類機を学習する（phase2)。

スクリーンショット 2020-07-09 00-26-34

Google Research

2020/4/23

We propose a novel extension to the contrastive loss function that allows for multiple positives per anchor. We thus adapt contrastive learning to the fully supervised setting.
We show that this loss allows us to learn state of the art representations compared to cross-entropy, giving significant boosts in top-1 accuracy and robustness.
Our loss is less sensitive to a range of hyperparameters than cross-entropy. This is an important practical consideration. We believe that this is due to the more natural formulation of our loss that pulls representations of samples from the same class to be pulled closer together, rather than forcing them to be pulled towards a specific target as done in cross-entropy
We show analytically that the gradient of our loss function encourages learning from hard positives and hard negatives. We also show that triplet loss [48] is a special case of our loss when only a single positive and negative are used.

スクリーンショット 2020-07-08 23-54-10

Embeddingのほうがより自然(We believe that this is due to the more natural formulation of our loss) な学習、というのが重要そう。そのため、少クラス分類には効果が薄いと思われる。
ハイパラ(aug, optimizer, lr)の影響を受けにくいと主張しているが、次元数128, τ=0.07(Fig6) もハイパラ。
パラメタ数25M(resnet50)ではCutmixと差がほぼない