huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
133.03k stars 26.54k forks source link

TypeError: forward() got an unexpected keyword argument 'labels' with mt5-small #17102

Closed paulthemagno closed 2 years ago

paulthemagno commented 2 years ago

System Info

- `transformers` version: 4.18.0
- Platform: Linux-4.14.252-131.483.amzn1.x86_64-x86_64-with-glibc2.9
- Python version: 3.6.13
- Huggingface_hub version: 0.4.0
- PyTorch version (GPU?): 1.10.2+cu102 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: no

Who can help?

No response

Information

Tasks

Reproduction

I have a problem in the training of a google/mt5-small

device = "cuda:0" if torch.cuda.is_available() else "cpu"

class_names = ["cmn","deu","rus","fra","eng","jpn","spa","ita","kor","vie","nld","epo","por","tur","heb","hun","ell","ind","ara","arz","fin","bul","yue","swe","ukr","bel","que","ces","swh","nno","wuu","nob","zsm","est","kat","pol","lat","urd","sqi","isl","fry","afr","ron","fao","san","bre","tat","yid","uig","uzb","srp","qya","dan","pes","slk","eus","cycl","acm","tgl","lvs","kaz","hye","hin","lit","ben","cat","bos","hrv","tha","orv","cha","mon","lzh","scn","gle","mkd","slv","frm","glg","vol","ain","jbo","tok","ina","nds","mal","tlh","roh","ltz","oss","ido","gla","mlt","sco","ast","jav","oci","ile","ota","xal","tel","sjn","nov","khm","tpi","ang","aze","tgk","tuk","chv","hsb","dsb","bod","sme","cym","mri","ksh","kmr","ewe","kab","ber","tpw","udm","lld","pms","lad","grn","mlg","xho","pnb","grc","hat","lao","npi","cor","nah","avk","mar","guj","pan","kir","myv","prg","sux","crs","ckt","bak","zlm","hil","cbk","chr","nav","lkt","enm","arq","lin","abk","pcd","rom","gsw","tam","zul","awa","wln","amh","bar","hbo","mhr","bho","mrj","ckb","osx","pfl","mgm","sna","mah","hau","kan","nog","sin","glv","dng","kal","liv","vro","apc","jdt","fur","che","haw","yor","crh","pdc","ppl","kin","shs","mnw","tet","sah","kum","ngt","nya","pus","hif","mya","moh","wol","tir","ton","lzz","oar","lug","brx","non","mww","hak","nlv","ngu","bua","aym","vec","ibo","tkl","bam","kha","ceb","lou","fuc","smo","gag","lfn","arg","umb","tyv","kjh","oji","cyo","urh","kzj","pam","srd","lmo","swg","mdf","gil","snd","tso","sot","zza","tsn","pau","som","egl","ady","asm","ori","dtp","cho","max","kam","niu","sag","ilo","kaa","fuv","nch","hoc","iba","gbm","sun","war","mvv","pap","ary","kxi","csb","pag","cos","rif","kek","krc","aii","ban","ssw","tvl","mfe","tah","bvy","bcl","hnj","nau","nst","afb","quc","min","tmw","mad","bjn","mai","cjy","got","hsn","gan","tzl","dws","ldn","afh","sgs","krl","vep","rue","tly","mic","ext","izh","sma","jam","cmo","mwl","kpv","koi","bis","ike","run","evn","ryu","mnc","aoz","otk","kas","aln","akl","yua","shy","fkv","gos","fij","thv","zgh","gcf","cay","xmf","tig","div","lij","rap","hrx","cpi","tts","gaa","tmr","iii","ltg","bzt","syc","emx","gom","chg","osp","stq","frr","fro","nys","toi","new","phn","jpa","rel","drt","chn","pli","laa","bal","hdn","hax","mik","ajp","xqa","pal","crk","mni","lut","ayl","ood","sdh","ofs","nus","kiu","diq","qxq","alt","bfz","klj","mus","srn","guc","lim","zea","shi","mnr","bom","sat","szl"]
features = Features({ 'label': ClassLabel(names=class_names), 'text': Value('string')})
num_labels = features['label'].num_classes

# split
data_files = {"train": "train.csv", "test": "test.csv"}
sentences = load_dataset("loretoparisi/tatoeba-sentences",
                             data_files=data_files,
                             delimiter='\t', 
                             column_names=['label', 'text'],
                             download_mode="force_redownload"
                             )

# filter
not_none_sentences = sentences.filter(lambda example: example['label'] is not None)

#tokenizer
model_name = 'google/mt5-small'
tokenizer = AutoTokenizer.from_pretrained(model_name)

def tokenize_function(examples):
    tokens = tokenizer(examples["text"], padding="max_length", truncation=True, max_length=128)
    tokens['label'] = features["label"].str2int(examples['label'])
    return tokens

tokenized_datasets = not_none_sentences.map(tokenize_function, batched=True)
full_train_dataset = tokenized_datasets["train"]
full_eval_dataset = tokenized_datasets["test"]

# model
model = MT5EncoderModel.from_pretrained(model_name, num_labels=num_labels)
model = model.to(device)

# metrics
metric = load_metric("accuracy")
def compute_metrics(eval_pred):
    print(eval_pred)
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

# train
training_args = TrainingArguments("checkpoints",
                                  per_device_train_batch_size=128, 
                                  num_train_epochs=3,
                                  learning_rate=3e-05)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=full_train_dataset,
    eval_dataset=full_eval_dataset,
    compute_metrics=compute_metrics,
)
trainer.train()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-26-ffe0f4836481> in <module>
      6     compute_metrics=compute_metrics,
      7 )
----> 8 trainer.train()

~/anaconda3/envs/python3/lib/python3.6/site-packages/transformers/trainer.py in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   1420                         tr_loss_step = self.training_step(model, inputs)
   1421                 else:
-> 1422                     tr_loss_step = self.training_step(model, inputs)
   1423 
   1424                 if (

~/anaconda3/envs/python3/lib/python3.6/site-packages/transformers/trainer.py in training_step(self, model, inputs)
   2009 
   2010         with self.autocast_smart_context_manager():
-> 2011             loss = self.compute_loss(model, inputs)
   2012 
   2013         if self.args.n_gpu > 1:

~/anaconda3/envs/python3/lib/python3.6/site-packages/transformers/trainer.py in compute_loss(self, model, inputs, return_outputs)
   2041         else:
   2042             labels = None
-> 2043         outputs = model(**inputs)
   2044         # Save past state if it exists
   2045         # TODO: this needs to be fixed and made cleaner later.

~/anaconda3/envs/python3/lib/python3.6/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1100         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1101                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102             return forward_call(*input, **kwargs)
   1103         # Do not call functions when jit is used
   1104         full_backward_hooks, non_full_backward_hooks = [], []

TypeError: forward() got an unexpected keyword argument 'labels'

Expected behavior

I tried with another model and it worked (using: `AutoModelForSequenceClassification`): 'microsoft/xtremedistil-l6-h256-uncased'.
LysandreJik commented 2 years ago

Since it seems linked to T5 specifically, pinging @patrickvonplaten

patrickvonplaten commented 2 years ago

cc @patil-suraj could you take a look here?

patil-suraj commented 2 years ago

Looking into it!

patil-suraj commented 2 years ago

Hi @paulthemagno !

This is because MT5EncoderModel or the T5EncoderModel is just a base model and does not have any head. So it does not accept the labels argument.

https://github.com/huggingface/transformers/blob/30be0da5da83419329b2bde93e4dada0ce7e31ae/src/transformers/models/t5/modeling_t5.py#L1812-L1821

To use this model for sequence classification, you could create a custom module and add a seq classification head on top of it, add the labels argument in forward and compute and return the loss. It should look very similar to how BertForSequenceClassification is implemented.

https://github.com/huggingface/transformers/blob/30be0da5da83419329b2bde93e4dada0ce7e31ae/src/transformers/models/bert/modeling_bert.py#L1508

Hope this helps :)

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

paulthemagno commented 2 years ago

Thnk you @patil-suraj, I followed your hints to create a T5ForSequenceClassification class, taking inspiration from BertForSequenceClassification.

I forked the last version of transformers in the main and pushed my updates. You can find it here.

Since Bert uses a pooled output (T5 not) and has only an encoder in BertModel (while T5Model Encoder-Decoder), I'm not sure about my edits, can you please check them?


some updates:

Difference between Bert/T5Model forward returned object

The main update is that the difference of the object returned from the forward function of BertModel and T5Model.

So in T5ForSequenceClassification forward function I tried to pool the output of Seq2SeqModelOutput (line) as done in BertModel:

        pooled_output = self.pooler(outputs[0]) if self.pooler is not None else None
        pooled_output = self.dropout(pooled_output)
        logits = self.classifier(pooled_output)

I didn't add it in T5Model directly because I didn't want to "break" its output, but probably you know how to do that in a cleaner way. For the same reason I defined the pooler in T5ForSequenceClassification (line) and not in T5Model as done in BertModel:

        self.pooler = T5Pooler(config) #if add_pooling_layer else None

T5Pooler is a copy of BertPooler.

T5Model has Encoder-Decoder while Bert only Encoder

The other doubt I have is that T5Model init needs both input/embeds_ids and decoder_input/embeds_ids, while BertModel only the first ones. So in T5ForSequenceClassification I pass also the decoder_input/embeds_ids params, but with the same values of input/embeds_ids (line). This probably could be a mistake.

        outputs = self.t5(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,

            # this is my main doubt: T5Model needs boht input_ids/embeds and decoder_input_ids/emebds while BertModel only input_ids/embeds
            # I tried to pass to T5Model the same values for these variables but I'm not sure about that
            decoder_input_ids=input_ids,
            decoder_inputs_embeds=inputs_embeds,
        )

I tried to do a training with a GPU instanceml.g4dn.xlarge using a t5-small on that but it seems very slow. Obviously, I replaced the definition of the model in the issue with:

model = T5ForSequenceClassification.from_pretrained(model_name, num_labels=num_labels)

Anyway, the training isn't raising errors.

Thank you!!