Sorry to bother you! I'm new to NLP, and I tried to use unsupervised simCSE on my own data, and the goal is to achieve best recall scores and precise scores (there is a test dataset) on my own data. I tried to use 10,000 - 90,000 data,1-2 epoches,learning_rate of 1e-5,with batch_size of 64 for training, and use a base model(roformer-sim Chinese version). But I found that the results from the trained model was worse than the base model.
I guess that questions happens in my datasets, maybe my dataset concludes lots of similiar sentence pairs naturally, which may causes bad influence to the contrastive learning step. Could this be true? What could I do to improve the results?
Thank you for your patience and hope for your reply!
Sorry to bother you! I'm new to NLP, and I tried to use unsupervised simCSE on my own data, and the goal is to achieve best recall scores and precise scores (there is a test dataset) on my own data. I tried to use 10,000 - 90,000 data,1-2 epoches,learning_rate of 1e-5,with batch_size of 64 for training, and use a base model(roformer-sim Chinese version). But I found that the results from the trained model was worse than the base model.
I guess that questions happens in my datasets, maybe my dataset concludes lots of similiar sentence pairs naturally, which may causes bad influence to the contrastive learning step. Could this be true? What could I do to improve the results?
Thank you for your patience and hope for your reply!