Open thomas-schillaci opened 1 month ago
I already have the code to do this, I'd be happy to open a PR if the features interests you
@Narsil @OlivierDehaene Please express interest and accept Thomas' PR 🙏 I fought the stale bot for a while on an earlier version of this issue, but eventually gave up:
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
Feature request
Right now the stop logic on TGI supports stopping on tokens, OpenAI is more flexible as it can stop on sub-tokens and sequences of sub-tokens
For example, comparing llama-3-8b-instruct on TGI and GPT4o, at temperature=0, I was able to generate: "...ipsum dolor sit amet, consectetur adipiscing elit." and "ipsum dolor sit amet, consectetur adipiscing elit." respectively.
sub-token example, using stop="lo" GPTo generates "ipsum do" TGI generates "...ipsum dolor sit amet, consectetur adipiscing elit"
sequences of multi-tokens example, using stop="it am" GPT4o generates "ipsum dolor s" TGI generates "...ipsum dolor sit amet, consectetur adipiscing elit."
Motivation
This is a very useful feature, and would make the /completions and /chat/completions routes more similar to their OpenAI equivalents
Your contribution
I already have the code to do this, I'd be happy to open a PR if the features interests you