Open wusuoweima opened 4 years ago
@wusuoweima Thanks for your interest in our work. We use contrastive loss in SNTG, so your question comes to the difference between pairwise contrastive loss and triplet loss. They are both proposed for metric learning without so much difference. Sometimes triplet loss may perform a little better since it does not push the embeddings of the same label to collapse into very small clusters. For more comparison and explanation, refer to the blog post and this. We mentioned in the paper the choice of $\ell_G$ is quite flexible. You can also try to replace the Siamese contrastive loss with the triplet loss. I think the novelty of SNTG lies in how to define positive pairs and negative pairs for unlabeled data, not the concrete form of loss. The labels for unlabeled examples are unknown and we use the target predictions given by the teacher model to select positive pairs.
What is the advantage of SNTG loss over triplet loss? What is their difference? @xinmei9322