Closed tomaarsen closed 1 year ago
Hi @tomaarsen,
Thanks for raising this up! I should test the differentiable head for binary classification.
This issue is also similar to one in my mind that maybe we should use CrossEntropyLoss
for both binary and multi-class classification, so we won't forget to test either in the future when other features add in. And it can solve the issue in PR #187 about different data types for different loss functions. Plus also make #179 easier. 😃
Does it sound good? I can fix it with a PR.
cc: @lewtun for your comment and advise
That sounds reasonable! Assuming that we can expect similar performance for CrossEntropyLoss
rather than BCELoss
, as I believe they are a bit different (sigmoid vs softmax).
yeah good point! I will run some experiments to test whether they can perform similarly. If so, will open a PR for it
Solved via #203
Hello!
Bug overview
__call__
orpredict
(they're aliases) is(N, 1)
withN
as the number of parameters.(N,)
.How to Reproduce
For a differentiable head
Copy-pasteable reproducing script (Script 1)
```python from datasets import load_dataset from sentence_transformers.losses import CosineSimilarityLoss from setfit import SetFitModel, SetFitTrainer, sample_dataset # Load a dataset from the Hugging Face Hub dataset = load_dataset("sst2") # Simulate the few-shot regime by sampling 8 examples per class train_dataset = sample_dataset(dataset["train"], label_column="label", num_samples=8) eval_dataset = dataset["validation"] # Load a SetFit model from Hub model: SetFitModel = SetFitModel.from_pretrained( "sentence-transformers/paraphrase-mpnet-base-v2", use_differentiable_head=True, ) # Create trainer trainer = SetFitTrainer( model=model, train_dataset=train_dataset, eval_dataset=eval_dataset, loss_class=CosineSimilarityLoss, metric="accuracy", batch_size=16, num_iterations=20, # The number of text pairs to generate for contrastive learning num_epochs=1, # The number of epochs to use for constrastive learning column_mapping={"sentence": "text", "label": "label"} # Map dataset columns to text/label expected by trainer ) # Train and evaluate trainer.train() # Run inference preds = model(["i loved the spiderman movie!", "pineapple on pizza is the worst 🤮"]) print(preds.shape) ```Note that we don't need to train this model to show the bug.
For a logistic regression head
Copy-pasteable reproducing script (Script 2)
```python from datasets import load_dataset from sentence_transformers.losses import CosineSimilarityLoss from setfit import SetFitModel, SetFitTrainer, sample_dataset # Load a dataset from the Hugging Face Hub dataset = load_dataset("sst2") # Simulate the few-shot regime by sampling 8 examples per class train_dataset = sample_dataset(dataset["train"], label_column="label", num_samples=8) eval_dataset = dataset["validation"] # Load a SetFit model from Hub model: SetFitModel = SetFitModel.from_pretrained( "sentence-transformers/paraphrase-mpnet-base-v2", ) # Create trainer trainer = SetFitTrainer( model=model, train_dataset=train_dataset, eval_dataset=eval_dataset, loss_class=CosineSimilarityLoss, metric="accuracy", batch_size=16, num_iterations=20, # The number of text pairs to generate for contrastive learning num_epochs=1, # The number of epochs to use for constrastive learning column_mapping={"sentence": "text", "label": "label"} # Map dataset columns to text/label expected by trainer ) # Train and evaluate trainer.train() # Run inference preds = model(["i loved the spiderman movie!", "pineapple on pizza is the worst 🤮"]) print(preds.shape) ```Note that the
LogisticRegression
head must be fitted before we can call__call__
, hence the larger example. This example is nearly directly taken from theREADME.md
.Expected behaviour
I would expect the output shape to be
(2,)
in both situations.Consequences
This is a fairly easy fix, but I found it curious that this was not discovered earlier. After all, the predictions from
SetFitHead
should be used in a loss function when training, right? An issue in the shape would be discovered there. Through some testing, I found out that the following fitting loop for the differentiable head is never called with the script above for the differentiable head: https://github.com/huggingface/setfit/blob/4a613b08267690dad0840444fc3e2caf60f29a44/src/setfit/modeling.py#L214-L240If instead I use the Trainer like recommended from the following training script, with
keep_body_frozen=True
, then I do get an error.https://github.com/huggingface/setfit/blob/4a613b08267690dad0840444fc3e2caf60f29a44/scripts/setfit/run_fewshot.py#L132-L142
Copy-pasteable reproducing script (Script 3)
```python from datasets import load_dataset from sentence_transformers.losses import CosineSimilarityLoss from setfit import SetFitModel, SetFitTrainer, sample_dataset # Load a dataset from the Hugging Face Hub dataset = load_dataset("sst2") # Simulate the few-shot regime by sampling 8 examples per class train_dataset = sample_dataset(dataset["train"], label_column="label", num_samples=8) eval_dataset = dataset["validation"] # Load a SetFit model from Hub model: SetFitModel = SetFitModel.from_pretrained( "sentence-transformers/paraphrase-mpnet-base-v2", use_differentiable_head=True ) # Create trainer trainer = SetFitTrainer( model=model, train_dataset=train_dataset, eval_dataset=eval_dataset, loss_class=CosineSimilarityLoss, metric="accuracy", batch_size=16, num_iterations=20, # The number of text pairs to generate for contrastive learning num_epochs=1, # The number of epochs to use for constrastive learning column_mapping={"sentence": "text", "label": "label"} # Map dataset columns to text/label expected by trainer ) # Freeze head trainer.freeze() # Do contrastive training trainer.train() # Unfreeze head and freeze body trainer.unfreeze(keep_body_frozen=True) # Train end-to-end trainer.train( num_epochs=25, body_learning_rate=1e-5, learning_rate=1e-2, l2_weight=0.0, ) # Run inference preds = model(["i loved the spiderman movie!", "pineapple on pizza is the worst 🤮"]) print(preds) ```The error is the following:
Triggering training on the differentiable head is very confusing right now, and so I would like to continue the discussion at https://github.com/huggingface/setfit/issues/179#issuecomment-1316710169 on a new class structure.
I'll submit a PR for this shortly.