Closed ibndias closed 4 months ago
same issue
In VLLM the stop sequence token is not printed.
this PR fixes a bug with some eos_tokens
not being included https://github.com/huggingface/text-generation-inference/pull/1808 and should resolve this issue.
@AnyISalIn and @ibndias would you please try the latest changes see if that resolves the bug.
Please feel free to re open this issue if the problem persists. Thank you! 🙏
Hi @drbh thanks for the effort but unfortunately the stop sequence is still shown in the output.
Here is the command and code to reproduce with open model.
docker run --gpus "device=0" --rm -e HUGGING_FACE_HUB_TOKEN=token --shm-size 1g -p 12314:80 --name tgi-openhermes -v /data2/derry/.cache/huggingface/hub:/data ghcr.io/huggingface/text-generation-inference:2.0.2 --model-id teknium/OpenHermes-2.5-Mistral-7B --cuda-memory-fraction 0.5
from openai import OpenAI
client = OpenAI(
base_url="http://10.125.121.102:12314/v1",
api_key="-")
completion = client.chat.completions.create(
model="openhermes",
temperature=0.5,
messages=[
{"role": "system", "content": "You are instruction follower."},
{"role": "user", "content": "Say number 1 - 10"}
],
stop=["7"],
max_tokens=100
)
print(completion.choices[0].message.content)
Result:
1, 2, 3, 4, 5, 6, 7
7 is still shown.
Seems like I there is no reopen issue button should I open as new issue? @drbh
System Info
Hi, I am working on a model that uses ChatML format which has:
<|im_end|>\n
at the end of its response.</s>
as theeos_token
When finish reason is
eos_token
, output is correctWhen I did not add stop sequence, the eos token finish reason is triggered, and the output is correct, no eos_token added in the output.
Output:
When finish reason is stop sequence, output is incorrect, adding stop sequence at the end of output
But when i add additional stop sequence, the generation finished correctly with correct finish reason but still adding the stop sequence token in the output.
Output:
ChatCompletion(id='', choices=[Choice(finish_reason='stop_sequence', index=0, logprobs=None, message=ChatCompletionMessage(content=" Deep learning is a subset of machine learning that involves training artificial neural networks to recognize patterns and make decisions. It's based on algorithms that are designed to mimic the structure and function of the human brain, using layers of interconnected nodes to process vast amounts of data and learn from it.\n\nDeep learning has proven to be extremely effective in a wide range of applications, including image and speech recognition, natural language processing, and predictive analytics. This is due in part to the large amounts of data and computing power that are now available, as well as advances in the design of neural network architectures.\n\nOverall, deep learning represents a powerful tool for solving complex problems and making predictions, and its use is likely to continue to grow in the coming years.<|im_end|>\n", role='assistant', function_call=None, tool_calls=None))], created=1712802448, model='PNU-Infosec/cipher-chiao-32k-v1.5', object='text_completion', system_fingerprint='1.4.5-sha-4ee0a0c', usage=CompletionUsage(completion_tokens=157, prompt_tokens=17, total_tokens=174))
Notice the
<|im_end|>\n
as stop sequence token is still included.Below is my TGI 1.4.5 docker details:
Issue
How do i prevent TGI to produce stop sequence on the output? Can I solve this by modifying on TGI side / model repository? Since in my goal is to make it compatible with all the frontend that supports OpenAI API.
Best Regards, Derry
Information
Tasks
Reproduction
Steps to Reproduce:
Expected behavior
Generation should not output the stop sequence as same as when finish reason is
eos_token