Question About bad results from trained model

Alison-starbeat commented 1 year ago

Sorry to bother you! I'm new to NLP, and I tried to use unsupervised simCSE on my own data, and the goal is to achieve best recall scores and precise scores (there is a test dataset) on my own data. I tried to use 10,000 - 90,000 data,1-2 epoches,learning_rate of 1e-5,with batch_size of 64 for training, and use a base model(roformer-sim Chinese version). But I found that the results from the trained model was worse than the base model.

I guess that questions happens in my datasets, maybe my dataset concludes lots of similiar sentence pairs naturally, which may causes bad influence to the contrastive learning step. Could this be true? What could I do to improve the results?

Thank you for your patience and hope for your reply!

gaotianyu1350 commented 1 year ago

Hi, can you elaborate more on the issue? For example, what this dataset is about, what the baseline model is, etc.

github-actions[bot] commented 1 year ago

Stale issue message

princeton-nlp / SimCSE

Question About bad results from trained model #240