Transformers `Trainer` is not compatible with multi label tasks

Bug report I have rewritten the fine tuning process to use the much more convenient Trainer class that uses Pytorch by default to tackle #1. However, when training the model on the multi label task, I get this error at the end indicating that evaluation is not possible because there is a problem in the format to apply the loss function. ValueError: Predictions and/or references don't match the expected format

In general, multi label tasks are poorly documented. I keep having problems and it is difficult to understand if the Dataset is not correctly prepared, if the hyperparameters are wrong, or if I am doing the evaluation of the model wrongly.

Code to reproduce These are the current params (apart from the data loading and preparation that is available in the script).


def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    return metric.compute(predictions=predictions, references=labels)

model = AutoModelForSequenceClassification.from_pretrained(
    "bert-base-uncased",
    problem_type="multi_label_classification",
    num_labels=len(labels),
    id2label=id2label,
    label2id=label2id,
)
args = TrainingArguments(
    output_dir="stop_reasons_classificator_multilabel_pt_500n_3epochs",
    evaluation_strategy="epoch",
    learning_rate=5e-5,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=32,
    data_seed=42,
    num_train_epochs=1,
    metric_for_best_model="accuracy",
    save_total_limit=2,
    save_strategy="no",
    load_best_model_at_end=False,
    push_to_hub=False,
)

trainer = MultilabelTrainer(
    model=model,
    args=args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    data_collator=data_collator,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
)
``

It seems like it is solved! By following the steps indicated in this notebook, I could extend the trainer class to add a custom loss function. I have been able to train and upload directly from the Hub! This is an example. See how nicely the metadata is integrated. I now need to extend the custom metrics to calculate F1 micro and macro.

class MultilabelTrainer(Trainer):
    def compute_loss(self, model, inputs, return_outputs=False):
        labels = inputs.pop("labels")
        # forward pass
        outputs = model(**inputs)
        logits = outputs.logits
        # compute custom loss
        loss_fct = torch.nn.BCEWithLogitsLoss()
        loss = loss_fct(logits.view(-1, self.model.config.num_labels), 
                        labels.float().view(-1, self.model.config.num_labels))
        return (loss, outputs) if return_outputs else loss

ireneisdoomed / stopReasons

Transformers `Trainer` is not compatible with multi label tasks #2