MT5 data padding not working #24567

Closed hexie1995 closed 8 months ago

hexie1995 commented 1 year ago

System Info


I am using the latest version of transformers.

I have run into this issue recently and would like to receive some help on it. I am using the MT5 and "google/base" to finetune to my own dataset, while processing the data, I run into the issue where I keep getting error message of dimension not matching even after padding and truncation like suggested in the example:

I tried the exact same code with XLMProphetNet, XLM Roberta, XLNet, all worked. Only MT5 gives me this error message. This error almost always occur at the first step when the trainer is trying to evaluate on the validation data. I suspect this has somethign to do with the evaluation loop, but so far I have found nothing that could help me resolve this issue.

RuntimeError: output with shape [4, 12, 1, 1] doesn't match the broadcast shape [4, 12, 1, 128] @alexayalamcs tagging Alex here.

from transformers import AutoTokenizer, XLMProphetNetDecoder,DataCollatorWithPadding
from transformers import DataCollatorForLanguageModeling
from datasets import concatenate_datasets, load_dataset
from transformers import MT5ForConditionalGeneration, MT5Tokenizer, MT5Config, MT5Model,T5Tokenizer
import torch
from import DataLoader
from transformers import Trainer
import nltk
import random
from accelerate import Accelerator
accelerator = Accelerator()
import datasets
rouge = datasets.load_metric("rouge")
import evaluate
accuracy_metric = evaluate.load("accuracy")

train = load_dataset("cnn_dailymail", "3.0.0", split = "train")
valid = load_dataset("cnn_dailymail", "3.0.0", split = "validation")
test = load_dataset("cnn_dailymail", "3.0.0", split = "test")

model = MT5ForConditionalGeneration.from_pretrained("google/mt5-base")
tokenizer = T5Tokenizer.from_pretrained("google/mt5-base")


def process_data_to_model_inputs(batch):
  # tokenize the inputs and labels
    inputs = tokenizer(batch["article"], padding="max_length",truncation=True, max_length=encoder_max_length)
    outputs = tokenizer(batch["highlights"],padding="max_length", truncation=True, max_length=decoder_max_length)

    batch["input_ids"] = inputs.input_ids
    batch["attention_mask"] = inputs.attention_mask
    batch["decoder_input_ids"] = outputs.input_ids
    batch["decoder_attention_mask"] = outputs.attention_mask
    batch["labels"] = outputs.input_ids.copy()

    return batch

train_data =
#train_data = train_init
#batch_size = 16

train_data =
    remove_columns=["article", "highlights", "id"]
    type="torch", columns=["input_ids", "attention_mask", "decoder_input_ids", "decoder_attention_mask", "labels"],

val_data =
#val_data = valid
val_data =
    remove_columns=["article", "highlights", "id"]
    type="torch", columns=["input_ids", "attention_mask", "decoder_input_ids", "decoder_attention_mask", "labels"],

from transformers import Seq2SeqTrainer,Seq2SeqTrainingArguments

training_args = Seq2SeqTrainingArguments(
    num_train_epochs = 3, 
    # logging_steps=1000,
    # save_steps=500,
    # eval_steps=7500,
    # warmup_steps=2000,
    # save_total_limit=3,

def compute_metrics(pred):
    labels_ids = pred.label_ids
    pred_ids = pred.predictions

    pred_str = tokenizer.batch_decode(pred_ids)
    label_str = tokenizer.batch_decode(labels_ids)

    rouge_output = rouge.compute(predictions=pred_str, references=label_str, rouge_types=["rouge2"])["rouge2"].mid

    return {
        "rouge2_precision": round(rouge_output.precision, 4),
        "rouge2_recall": round(rouge_output.recall, 4),
        "rouge2_fmeasure": round(rouge_output.fmeasure, 4),

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
trainer = Seq2SeqTrainer(

Expected behavior

I would expect this to run through just fine like XLMPropheNet, XLM Roberta, and XLNet, but it does not.

sgugger commented 1 year ago

cc @ArthurZucker

hexie1995 commented 1 year ago

Thank you. One additional information: I tried to follow step by step the official text summrization tutorial here: But the same error occurred. Thanks a lot!

ArthurZucker commented 1 year ago

Hey! Thanks for reporting could you share the entire traceback of the error? 😉

hexie1995 commented 1 year ago

Sure, here's the whole error message. Thanks a lot!

RuntimeError: output with shape [4, 12, 1, 1] doesn't match the broadcast shape [4, 12, 1, 32]
ArthurZucker commented 11 months ago

Hey! I did not have time to check this, if you can isolate a small reproduction script (without all the training loop) would be great. Otherwise, I am investigating

hexie1995 commented 11 months ago

Hi Arthur @ArthurZucker , the code that I shared initially is a small training loop without all the samples and could reproduce the error once run (the training size is set to be 16 and the evaluation set to be 8). The run time should take about 3 minutes top, because it has to download the CNNDailyMail dataset first. Thank a lot for your help!!

ArthurZucker commented 10 months ago

Ok, low on bandwidth so pinging @Rocketknight1 in case he can have a look!

ArthurZucker commented 9 months ago

Sorry @hexie1995 did not have time to have look 😢

Rocketknight1 commented 9 months ago

I figured this one out! Making a PR.

Rocketknight1 commented 8 months ago

@hexie1995 This should now be fixed on main! You can install from main with pip install git+ It will also be included in the next release, at which point you can go back to just pip install transformers.

And thanks for the bug report - it turns out there really was an issue deep in the transformers code that was causing this!

hexie1995 commented 8 months ago

Thank you! This is wonderful news. I will install the new one now.