Closed pbrochar closed 2 years ago
The output lengths is not limited to 115 - it's simply that T5 just generates an EOS token after 115 tokens. So to make the output longer you could play around with things like some of generate
arguments (check them here: https://huggingface.co/transformers/main_classes/model.html?highlight=generate#transformers.generation_utils.GenerationMixin.generate), such as:
In a first step I would try to set min_length
to 120 to force the model to output longer sequences
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Yes, it needs to be addressed for flan T5 based models.
Environment info
transformers
version: 4.11.3Who can help
Information
Model I am using : T5-Base model for Translation task (en-fr)
The problem arises when using my own modified scripts:
That produce the following ouput
To reproduce
This is a minimal example of the script, copying it is enough for an example. Other sentences :
Expected behavior
The translated sentence is truncated. In the example, the end of the sentence is missing in the translation, the following words are not translated "[...]lastly, as this sentence is, celebrating the list." This happens with other large sentences. We found that the size of the output tensor is maximum 115.
Why is the output size of the tensor limited to 115? I know we could use LED or Longformer, but we would like to understand why this happens with large sentences, and to understand the proper workflow for long sentences with this model.