Support OpenAI's stop parameter logic

thomas-schillaci commented 1 month ago

Feature request

Right now the stop logic on TGI supports stopping on tokens, OpenAI is more flexible as it can stop on sub-tokens and sequences of sub-tokens

For example, comparing llama-3-8b-instruct on TGI and GPT4o, at temperature=0, I was able to generate: "...ipsum dolor sit amet, consectetur adipiscing elit." and "ipsum dolor sit amet, consectetur adipiscing elit." respectively.

sub-token example, using stop="lo" GPTo generates "ipsum do" TGI generates "...ipsum dolor sit amet, consectetur adipiscing elit"

sequences of multi-tokens example, using stop="it am" GPT4o generates "ipsum dolor s" TGI generates "...ipsum dolor sit amet, consectetur adipiscing elit."

Motivation

This is a very useful feature, and would make the /completions and /chat/completions routes more similar to their OpenAI equivalents

Your contribution

I already have the code to do this, I'd be happy to open a PR if the features interests you

josephrocca commented 1 month ago

I already have the code to do this, I'd be happy to open a PR if the features interests you

@Narsil @OlivierDehaene Please express interest and accept Thomas' PR 🙏 I fought the stale bot for a while on an earlier version of this issue, but eventually gave up:

https://github.com/huggingface/text-generation-inference/issues/1039

github-actions[bot] commented 3 days ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

huggingface / text-generation-inference