Closed leoozy closed 1 year ago
Hello
Maybe back-translation is not work on sentence embeddings with contrastive learning. You can also try back-translation with SimCSE to see whether it work. For example, ConSERT uses many data augment methods to produce positive example, but it still underperform SimCSE. ConSERT although mention the back-translation:
I think the reason is not different length. For supervised setting, we can directly use different sentences as positive pair, which is better than positive pair from the different prompts.
Thank for your rapid reply. Have you tired to use different prompts in the supervised setting? I can get the results reported in your code (about 82.5%), but if I use different prompts, the performance will drop to 79%. I am really confused about it.
I have tried it with different prompts. The performance is 82.03. You can get this result by adding following code in run.sh
and run bash run.sh sup-roberta-dp
"sup-roberta-dp")
BC=(python -m torch.distributed.launch --nproc_per_node 4 train.py)
TRAIN_FILE=data/nli_for_simcse.csv
BATCH=128
EPOCH=3
LR=5e-5
MODEL=roberta-base
TEMPLATE="*cls*_This_sentence_:_'_*sent_0*_'_means*mask*.*sep+*"
TEMPLATE2="*cls*_The_sentence_:_'_*sent_0*_'_means*mask*.*sep+*"
args=(--mask_embedding_sentence\
--mask_embedding_sentence_template $TEMPLATE\
--mask_embedding_sentence_different_template $TEMPLATE2\
--mask_embedding_sentence_delta)
eargs=(--mask_embedding_sentence_use_pooler\
--mask_embedding_sentence_delta \
--mask_embedding_sentence \
--mask_embedding_sentence_template $TEMPLATE )
;;
I tired to aug the positive examples using back-translation with different prompt (the prompt used in unsupervised roberta) . Specifically, I got two views of the positive examples using back-translation. Then I feed them into promptbert, but I found that the avg spearman score for roberta base is only 75. (~79.2 for your paper). I also tired to only use the back-translation and I got an avg spearman score of 77. I am confused that why the prompt do not work for augmented positive data. Do you have some ideas?
I also found that in the supervised setting, using different prompt will hurt the performance. Does this means that your method only works for positive pairs with the same length? Thank you!