Open anar2706 opened 1 year ago
Is your goal to produce a text classification model or an embedding model? If the latter, then you might not want to use SetFit. However, if you want to use text classification using the text classification datasets from MTEB, then you could try to combine them into one larger dataset (e.g. with concatenate_datasets
), and using that. Do note however that (too) large datasets don't strictly improve the model performance.
So, in short, calling train
once with a combined training dataset will work better than calling it X times with X different training datasets.
Hi @tomaarsen I want train bge base en 1.5 on multiple datasets from mteb. But when I train it on one dataset and saving, then using this saved model and finetune on another dataset it decresas accuracy so much. Can you please give an example or way how to do it correctly ?