SSL has also been used as an auxiliary task to improve the performance on the main task, such as generative model learning, semi-supervised learning, and improving robustness and uncertainty.
最近对比学习很牛。但依赖领域特定的归纳偏置
However, while the concept of contrastive learning is applicable to any domains, the quality of learned representations rely on the domain-specific inductive bias: as anchors and positive samples are obtained from the same data instance, data augmentation introduces semantically meaningful variance for better generalization. [...] Hence, contrastive representation learning in different domains requires an effort to develop effective data augmentations.
有时大规模无标签数据也得不到
Furthermore, while recent works have focused on large-scale settings where millions of unlabeled data is available, it would not be practical in real-world applications. For example, in lithography, acquiring data is very expensive in terms of both time and cost due to the complexity of manufacturing process
同时mixup增强数据在不同domain和task上都好用
Meanwhile, MixUp has shown to be a successful data augmentation for supervised learning in various domains and tasks, including image classification, generative model learning, and natural language processing.
所以直接有个问题,mixup用在对比表示学习中好不好使。(???就感觉逻辑跳的有点快)
In this paper, we explore the following natural, yet important question: is the idea of MixUp useful for unsupervised, self-supervised, or contrastive representation learning across different domains?
InputMix:只mix输入,引入噪声。This method can be viewed as introducing structured noises driven by auxiliary data to the principal data with the largest mixing coefficient λ, and the label of the principal data is assigned to the mixed data (Shen et al., 2020; Verma et al., 2020; Zhou et al., 2020).
MixUp is a vicinal risk minimization method (Chapelle et al., 2001) that augments data and their labels in a data-driven manner. Not only improving the generalization on the supervised task, it also improves adversarial robustness (Pang et al., 2019) and confidence calibration (Thulasidasan et al., 2019).
模型就是CNN/MLP + two-layer MLP projection head,下游任务就把 projection head 换成特定head。
主要还是数据增强提了性能。
小数据涨点儿挺明显,3个点。
baseline无数据增强,加本方法也提升显著。
4 在你认知范围内,哪些其它任务可以尝试
没了吧,就给对比学习做的,没太多新东西。
5 好的词语、句子或段落
Self-supervised representation learning (SSL) has been successfully applied in several domains, including image recognition, natural language processing, robotics, speech recognition, and video understanding.
which outperforms its supervised pre-training counterpart (He et al., 2016) on downstream tasks.
In particular, the discriminative performance of representations learned with i-Mix is on par with fully supervised learning on CIFAR-10/100 and Speech Commands.
内容
用mixup为对比学习做数据增强,可以在SimCLR,MoCo,BYOL上用,在图像、语音和表格数据(为啥这类文章都不做文本?)上都有提升。也验证了方法的正则化效果,在(1)数据不够(2)数据增强的领域知识不够,两种情况下明显改善了对比学习。
具体就是为batch中每个instance分配virtual label(其实就是它的序号对应的one-hot),然后再mixup生成新输入,进行对比学习,最后对loss插值。上code。
信息
0 intro的概括
表示学习(Representation learning)很重要,自监督表示学习很成功,可以构建许多学习任务(上下文预测,对比学习等)
最近对比学习很牛。但依赖领域特定的归纳偏置
有时大规模无标签数据也得不到
同时mixup增强数据在不同domain和task上都好用
所以直接有个问题,mixup用在对比表示学习中好不好使。(???就感觉逻辑跳的有点快)
提出 instance Mix (i-Mix),可以在SimCLR,MoCo,BYOL上用。
1 学习到的新东西:
2 通过Related Work了解到了哪些知识
3 实验验证任务,如果不太熟悉,需要简单描述
模型就是CNN/MLP + two-layer MLP projection head,下游任务就把 projection head 换成特定head。
主要还是数据增强提了性能。
4 在你认知范围内,哪些其它任务可以尝试
没了吧,就给对比学习做的,没太多新东西。
5 好的词语、句子或段落