Closed TomBerton closed 1 year ago
I beg to differ. Examples are meant to be simple to read, Having a real long form text just hinders readability imo.
min_length
and max_length
are specified here: https://huggingface.co/docs/transformers/v4.28.1/en/main_classes/text_generation#transformers.GenerationMixin.greedy_search.max_length
@sgugger What do you think here ? I agree examples shouldn't raise warnings, however I feel odd burning the name of a specific model into this example, since users are likely to not understand where to get that model id from.
# Fetch summarization models at https://huggingface.co/models?pipeline_tag=summarization&sort=downloads
summarizer = pipeline(model="philschmid/bart-large-cnn-samsum")
Something like that. That probably affects ALL examples within pipelines.
cc @gante The warning somehow needs to be addressed so that users of the pipeline
function do not see it.
Hi @TomBerton 👋
The warnings you described were updated in #23128, which should make the pipeline experience more pleasant and self-documenting 🤗
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
Using Google Colab on Mac OS Ventura 13.2.1 Chrome Version 112.0.5615.137 (Official Build) (x86_64)
Using the install command.
!pip install transformers
Which downloads the following:Who can help?
@Narsil
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
In the documentation for the pipeline summarization here the example needs updating. Use the current example below:
# use bart in pytorch
summarizer = pipeline("summarization") summarizer("An apple a day, keeps the doctor away", min_length=5, max_length=20)
Produces the following output in Google Colab.
Using a pipeline without specifying a model name and revision in production is not recommended. Your max_length is set to 20, but you input_length is only 11. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=5) [{'summary_text': ' An apple a day, keeps the doctor away from your doctor away, says Dr.'}]
The documentation doesn't state what
min_length=
andmax_length=
actually do and the output doesn't tell you either.max_length
the maximum token length of the output or input?Running this code:
# use t5 in tf
summarizer = pipeline("summarization", model="t5-base", tokenizer="t5-base", framework="tf") summarizer("An apple a day, keeps the doctor away", min_length=5, max_length=20)
Produces the following output in Google Colab. .
Your max_length is set to 20, but you input_length is only 13. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=6) /usr/local/lib/python3.10/dist-packages/transformers/generation/tf_utils.py:745: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation) warnings.warn( [{'summary_text': 'an apple a day, keeps the doctor away from the doctor .'}]
Expected behavior
min_length=
andmax_length=
actually do.