Open brianmvk opened 1 year ago
Happy to learn something new here, but I think SetFit at this point doesn't support passing two sentences to the model (except as part of the same input string) as part of the training process.
Would a cross encoder be a better solution for this task?
This is indeed currently not supported. This task has previously been called sentence pair classification, although it's usually used for Natural Language Inference. #91 is a related issue.
I eventually saw that this was not yet possible. Thank you!
Hi all, I'm trying to train SBERT to classify 2 sentences as being duplicates or not using set fit. How do I make it so that "column_mappings" exepts 2 sentences instead of one?
Below is the code I tried.
Create trainer
trainer = SetFitTrainer( model = model, #SBERT model train_dataset = train_dataset, eval_dataset = eval_dataset, loss_class = CosineSimilarityLoss, metric = "accuracy", batch_size = 32, #2X num_samples num_iterations = 60, num_epochs = 3, column_mapping ={"sentence1Title": "text1","sentence2Title": "text2", "duplicate": "label", "text": "text"} )
This is the error I get: ValueError: The column mapping expected the columns ['duplicate', 'sentence1Title', 'sentence2Title', 'text'] in the dataset, but the dataset had the columns ['Unnamed: 0', 'duplicate', 'sentence1Body', 'sentence1Title', 'sentence2Body', 'sentence2Title'].
thank you in advance!