Bug in Marian model (or tokenizer) in transformers==4.18.0

MorenoLaQuatra commented 2 years ago

Environment info

transformers version: 4.18.0
Platform: Google Colab / Linux & Conda
Python version: 3.7.13
PyTorch version (GPU?): 1.10.0
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: No

Who can help

@patil-suraj

Information

Model I am using (Bert, XLNet ...): Marian

The problem arises when using:

[ ] the official example scripts: (give details below)
[x] my own modified scripts: (give details below)

The tasks I am working on is:

[x] an official GLUE/SQUaD task: Oscar & ALT - Standard MT task
[ ] my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

Extend the tokenizer using a target one
Add tokens
Run forward with model in training mode.
script and error reported here: https://colab.research.google.com/drive/1utS-L1iO1paiwKKPNqVHW5ARvprfRgG2?usp=sharing

Traceback below:

[/usr/local/lib/python3.7/dist-packages/transformers/models/marian/modeling_marian.py](https://localhost:8080/#) in forward(self, input_ids, attention_mask, decoder_input_ids, decoder_attention_mask, head_mask, decoder_head_mask, cross_attn_head_mask, encoder_outputs, past_key_values, inputs_embeds, decoder_inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict)
   1452         if labels is not None:
   1453             loss_fct = CrossEntropyLoss()
-> 1454             masked_lm_loss = loss_fct(lm_logits.view(-1, self.target_vocab_size), labels.view(-1))
   1455 
   1456         if not return_dict:

RuntimeError: shape '[-1, 65001]' is invalid for input of size 8320768

Expected behavior

Standard Marian training output. No issue with transformers==4.17.0

patil-suraj commented 2 years ago

Good catch! Fix is here #16700

MorenoLaQuatra commented 2 years ago

Thank you!

huggingface / transformers

Bug in Marian model (or tokenizer) in transformers==4.18.0 #16670

Environment info

Who can help

Information

To reproduce

Expected behavior