princeton-nlp / SimCSE

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
MIT License
3.31k stars 502 forks source link

How to use my own datasets? #237

Closed Hhx1999 closed 1 year ago

Hhx1999 commented 1 year ago

Hi, if I want to use data about biomedical literature as a training corpus, does the --metric_for_best_model stsb_spearman need to be changed? Thank you!

jeongwoopark0514 commented 1 year ago

You can check out training part of ReadME. It says you can modify train_file argument for that.

github-actions[bot] commented 1 year ago

Stale issue message