huggingface / setfit

Efficient few-shot learning with Sentence Transformers
https://hf.co/docs/setfit
Apache License 2.0
2.25k stars 223 forks source link

Multilabel Classification with setfit #428

Closed swtive closed 1 year ago

swtive commented 1 year ago

Hi I am running multilabel setfit model as given in this link with the changes of adding freeze and unfreeze statements for the model similar to the text classification as given below:


from datasets import load_dataset

model_id = "sentence-transformers/paraphrase-mpnet-base-v2"
dataset = load_dataset("ethos", "multilabel")

import numpy as np

features = dataset["train"].column_names
features.remove("text")

num_samples = 8
samples = np.concatenate(
    [np.random.choice(np.where(dataset["train"][f])[0], num_samples) for f in features]
)

train_dataset = dataset["train"].select(samples)
eval_dataset = dataset["train"].select(
    np.setdiff1d(np.arange(len(dataset["train"])), samples)
)

from setfit import SetFitModel

model = SetFitModel.from_pretrained(model_id, multi_target_strategy="one-vs-rest", use_differentiable_head = True)

from sentence_transformers.losses import CosineSimilarityLoss
from setfit import SetFitTrainer

trainer = SetFitTrainer(
    model=model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    loss_class=CosineSimilarityLoss,
    num_iterations=20,
    column_mapping={"text": "text", "labels": "label"},
)

trainer.freeze()  #freeze the weights of the final layer
trainer.train()

trainer.unfreeze(keep_body_frozen=True)

trainer.train()

I am getting error of "ValueError: Target size (torch.Size([16, 8])) must be the same as input size (torch.Size([16, 2]))" how to change the target size for second trainer.train() statement?

swtive commented 1 year ago

resolved this by adding head_params={"out_features": 8} , at model =SetFitModel.from_pretrained step