Open isaldiviagonzatti opened 8 months ago
Hello, I have the same problem, have you find a solution ? Thanks
@amina8annane Sorry, honestly I don't remember if or how I solved it. I know I did get results with setfit but they were quite poor for my use case, so I didn't pursue it further. See if either of these resources help: https://www.reddit.com/r/learnmachinelearning/comments/r7ki6k/how_to_fix_multioutput_target_data_is_not/ AND https://stackoverflow.com/questions/58171410/multioutput-target-data-is-not-supported-with-label-binarization
In case you wanna compare to my code:
I checked and have the following:
def encode_labels(record):
return {"labels": [record[feature] for feature in features]}
dataset = ds['train'].map(encode_labels)
train_dataset = dataset.select(samples)
eval_dataset = dataset.select(
np.setdiff1d(np.arange(len(dataset)), samples)
)
from setfit import SetFitModel
model_id = "sentence-transformers/paraphrase-mpnet-base-v2"
model = SetFitModel.from_pretrained(model_id, multi_target_strategy="multi-output") # multi-output
model.model_head
from sentence_transformers.losses import CosineSimilarityLoss
from setfit import SetFitModel, Trainer, TrainingArguments, sample_dataset
args = TrainingArguments(
head_learning_rate= 0.0006155918397454662,
batch_size=1, # 1
num_epochs=1,
# max_steps= 2350, # overrides num_epochs
# eval_max_steps=10,
# num_iterations=20,
max_length=1000
)
trainer = Trainer(
model=model,
args=args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
metric="accuracy",
column_mapping={"abstract": "text", "labels": "label"},
)
trainer.train()
After running the trainer for >5 hours, I get ValueError: Multioutput target data is not supported with label binarization
My train_dataset and eval_dataset have one text column, one labels column (binary) and one column for each label. So it's the same as in the text-classification_multilabel.ipynb example.
Any idea what could be going on? Thanks