huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.09k stars 26.31k forks source link

StoppingCriteria for Repetition #32902

Open Vipitis opened 3 weeks ago

Vipitis commented 3 weeks ago

Feature request

similar to repetition_penalty for generation config, but as a stopping criteria.

Motivation

(small?) models tend to generated endless loops of the same few tokens, or a combination where they only increase like a single digit. (could not find any similar FRs)

I run into this quite a lot when doing evaluation runs (with greedy decoding) for code completion tasks. here is a screenshot of multiple generations saved to a file. the blocks of repetition can easily be spotted. image

Having a stopping criterion that detects such behaviour would massively speed up evaluation runs, since generation could stop early and not reach the max_new_token set. Some parameters might be helpful to expose like number of repetitions, and n-gram overlap for example.

Your contribution

I am happy to contribute with a PR myself, but will not find the time to do so in the next ~6-8 weeks. It doesn't look straight forward, but I am also not too familiar with the deeper parts of the generation code - so it might take me a while.

zucchini-nlp commented 3 weeks ago

Hey @Vipitis ! A small question, can we use only repetition penalty to prevent this instead of forcing to stop when an ngram is repeated?

cc @gante

Vipitis commented 3 weeks ago

can we use only repetition penalty

Likely yes in most practical settings. But in the case of runnings generation for eval benchmarks, some require greedy decoding. Using any of the generation config parameters impacts the tokens you decode, therefore adding variables to the experiment. Stopping criteria is just that, it stops early when the generation has already failed. There is a non zero chance that the model somehow recovers and still completes a valid function, but I have not observed that yet.

gante commented 3 weeks ago

Hey @Vipitis 👋 Thank you for opening this issue!

It does make sense to save compute cycles when we have high confidence that the output won't improve -- repetition is one of those cases. I'd gladly accept a PR that adds that StoppingCriteria :)

(suggestion: we can add a stop_at_repeated_ngram_size flag, or something similar, to GenerationConfig)