Question about regularization item.

hhc1997 commented 1 year ago

The regularization item is proposed to enable the aug features not to be too similar to the original features. This goal is achieved by Eq.(4) - Eq(5). For example, if min({d({z+})k+}) > max({d({z-})k-}), it means the encoder is well learned and makes all positive pairs features more similar than negative ones. In Eq(5), the first item will be [d({z^+})k^+})-max({d({z-})k-})]+ which encourages the similarity of positive aug features be smaller than the negative original features. I feel strange that the aug positive features are limited to be less similar even than the original negatives. Is this too strict to limit the aug features?

jiangmengli commented 1 year ago

Thanks for your interests. We also concerned about such an issue ''is this too strict to limit the aug features?'', so that we propose three variants of the regularization term: Large, Medium, Small. Also, the hyperparameter \alpha can intuitively tune the impact of the regularization term. In practice, we find that after well-tuning, the Large variant of the regularization term improves our method best. We reckon the reason is that the Large variant can continuously impact the training of the model, no matter at the beginning or approaching to the end, since the Large variant can hardly (about never) achieve convergence (just because what you mentioned, such an regularization is very strict). Then, please notice that the contrastive learning is a ''instance-level'' learning paradigm, so that there must exist certain false negative samples, i.e,. the samples actually belonging to the same category of the anchor but being incorrectly treated as the negatives. According to the paper ''Chaos is a Ladder: A New Theoretical Understanding of Contrastive Learning via Augmentation Overlap'' and our new paper ''MetaMask: Revisiting Dimensional Confounder for Self-Supervised Learning'', at the end stages of the training, the representation learned by contrastive methods can naturally model the samples with the same category into range of approach while samples with different categories into far away ranges. Therefore, at the ending stages, the Medium and Small variants of the regularization cannot provide enough gradient effects on the model training, since some specific samples (false negative samples) may lead such regularization terms to derive trivial results. Concretely, our ablation study, in Appendix, proves the statement.

hhc1997 commented 1 year ago

Thanks for your reply. I will close this issue.

jiangmengli / metaug

Question about regularization item. #2