Could you provide what RoBERTa model was used during research? We recreated your results with all provided models witch success, but RoBERTa seems to pose some problems. The screnshot bellow shows results presented in your paper (first row) and our results (second row). We used roberta large model from https://github.com/facebookresearch/fairseq/blob/main/examples/roberta/README.md
We have yet to understand why the scores differ so much.
Apology for the confusion, in Table 4 for the paper, RoBERTa refers to princeton-nlp/sup-simcse-roberta-large from Gao et al. The model can be found in their repo
Could you provide what RoBERTa model was used during research? We recreated your results with all provided models witch success, but RoBERTa seems to pose some problems. The screnshot bellow shows results presented in your paper (first row) and our results (second row). We used roberta large model from https://github.com/facebookresearch/fairseq/blob/main/examples/roberta/README.md We have yet to understand why the scores differ so much.