[Bug] `__call__`/`predict` has different shape using differentiable head

tomaarsen commented 1 year ago

Hello!

Bug overview

When using a differentiable head, the output shape of calling __call__ or predict (they're aliases) is (N, 1) with N as the number of parameters.
When using the logistic regression head, the output of that same call is (N,).

How to Reproduce

For a differentiable head

Copy-pasteable reproducing script (Script 1)

```python from datasets import load_dataset from sentence_transformers.losses import CosineSimilarityLoss from setfit import SetFitModel, SetFitTrainer, sample_dataset # Load a dataset from the Hugging Face Hub dataset = load_dataset("sst2") # Simulate the few-shot regime by sampling 8 examples per class train_dataset = sample_dataset(dataset["train"], label_column="label", num_samples=8) eval_dataset = dataset["validation"] # Load a SetFit model from Hub model: SetFitModel = SetFitModel.from_pretrained( "sentence-transformers/paraphrase-mpnet-base-v2", use_differentiable_head=True, ) # Create trainer trainer = SetFitTrainer( model=model, train_dataset=train_dataset, eval_dataset=eval_dataset, loss_class=CosineSimilarityLoss, metric="accuracy", batch_size=16, num_iterations=20, # The number of text pairs to generate for contrastive learning num_epochs=1, # The number of epochs to use for constrastive learning column_mapping={"sentence": "text", "label": "label"} # Map dataset columns to text/label expected by trainer ) # Train and evaluate trainer.train() # Run inference preds = model(["i loved the spiderman movie!", "pineapple on pizza is the worst 🤮"]) print(preds.shape) ```

(2, 1)

Note that we don't need to train this model to show the bug.

For a logistic regression head

Copy-pasteable reproducing script (Script 2)

```python from datasets import load_dataset from sentence_transformers.losses import CosineSimilarityLoss from setfit import SetFitModel, SetFitTrainer, sample_dataset # Load a dataset from the Hugging Face Hub dataset = load_dataset("sst2") # Simulate the few-shot regime by sampling 8 examples per class train_dataset = sample_dataset(dataset["train"], label_column="label", num_samples=8) eval_dataset = dataset["validation"] # Load a SetFit model from Hub model: SetFitModel = SetFitModel.from_pretrained( "sentence-transformers/paraphrase-mpnet-base-v2", ) # Create trainer trainer = SetFitTrainer( model=model, train_dataset=train_dataset, eval_dataset=eval_dataset, loss_class=CosineSimilarityLoss, metric="accuracy", batch_size=16, num_iterations=20, # The number of text pairs to generate for contrastive learning num_epochs=1, # The number of epochs to use for constrastive learning column_mapping={"sentence": "text", "label": "label"} # Map dataset columns to text/label expected by trainer ) # Train and evaluate trainer.train() # Run inference preds = model(["i loved the spiderman movie!", "pineapple on pizza is the worst 🤮"]) print(preds.shape) ```

(2,)

Note that the LogisticRegression head must be fitted before we can call __call__, hence the larger example. This example is nearly directly taken from the README.md.

Expected behaviour

I would expect the output shape to be (2,) in both situations.

Consequences

This is a fairly easy fix, but I found it curious that this was not discovered earlier. After all, the predictions from SetFitHead should be used in a loss function when training, right? An issue in the shape would be discovered there. Through some testing, I found out that the following fitting loop for the differentiable head is never called with the script above for the differentiable head: https://github.com/huggingface/setfit/blob/4a613b08267690dad0840444fc3e2caf60f29a44/src/setfit/modeling.py#L214-L240

If instead I use the Trainer like recommended from the following training script, with keep_body_frozen=True, then I do get an error.

https://github.com/huggingface/setfit/blob/4a613b08267690dad0840444fc3e2caf60f29a44/scripts/setfit/run_fewshot.py#L132-L142

Copy-pasteable reproducing script (Script 3)

```python from datasets import load_dataset from sentence_transformers.losses import CosineSimilarityLoss from setfit import SetFitModel, SetFitTrainer, sample_dataset # Load a dataset from the Hugging Face Hub dataset = load_dataset("sst2") # Simulate the few-shot regime by sampling 8 examples per class train_dataset = sample_dataset(dataset["train"], label_column="label", num_samples=8) eval_dataset = dataset["validation"] # Load a SetFit model from Hub model: SetFitModel = SetFitModel.from_pretrained( "sentence-transformers/paraphrase-mpnet-base-v2", use_differentiable_head=True ) # Create trainer trainer = SetFitTrainer( model=model, train_dataset=train_dataset, eval_dataset=eval_dataset, loss_class=CosineSimilarityLoss, metric="accuracy", batch_size=16, num_iterations=20, # The number of text pairs to generate for contrastive learning num_epochs=1, # The number of epochs to use for constrastive learning column_mapping={"sentence": "text", "label": "label"} # Map dataset columns to text/label expected by trainer ) # Freeze head trainer.freeze() # Do contrastive training trainer.train() # Unfreeze head and freeze body trainer.unfreeze(keep_body_frozen=True) # Train end-to-end trainer.train( num_epochs=25, body_learning_rate=1e-5, learning_rate=1e-2, l2_weight=0.0, ) # Run inference preds = model(["i loved the spiderman movie!", "pineapple on pizza is the worst 🤮"]) print(preds) ```

The error is the following:

Traceback (most recent call last):
  File "[sic]\setfit\demo_diff_head_freeze.py", line 42, in <module>
    trainer.train(
  File "[sic]\setfit\src\setfit\trainer.py", line 376, in train
    self.model.fit(
  File "[sic]\setfit\src\setfit\modeling.py", line 236, in fit
    loss = criterion(predictions, labels)
  File "[sic]\envs\setfit\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "[sic]\envs\setfit\lib\site-packages\torch\nn\modules\loss.py", line 619, in forward
    return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction)
  File "[sic]\envs\setfit\lib\site-packages\torch\nn\functional.py", line 3086, in binary_cross_entropy
    raise ValueError(
ValueError: Using a target size (torch.Size([16])) that is different to the input size (torch.Size([16, 1])) is deprecated. Please ensure they have the same size.

Triggering training on the differentiable head is very confusing right now, and so I would like to continue the discussion at https://github.com/huggingface/setfit/issues/179#issuecomment-1316710169 on a new class structure.

I'll submit a PR for this shortly.

Tom Aarsen

blakechi commented 1 year ago

Hi @tomaarsen,

Thanks for raising this up! I should test the differentiable head for binary classification.

This issue is also similar to one in my mind that maybe we should use CrossEntropyLoss for both binary and multi-class classification, so we won't forget to test either in the future when other features add in. And it can solve the issue in PR #187 about different data types for different loss functions. Plus also make #179 easier. 😃

Does it sound good? I can fix it with a PR.

cc: @lewtun for your comment and advise

tomaarsen commented 1 year ago

That sounds reasonable! Assuming that we can expect similar performance for CrossEntropyLoss rather than BCELoss, as I believe they are a bit different (sigmoid vs softmax).

blakechi commented 1 year ago

yeah good point! I will run some experiments to test whether they can perform similarly. If so, will open a PR for it

tomaarsen commented 1 year ago

Solved via #203

huggingface / setfit