Closed erickrf closed 2 weeks ago
Have you tried sourcing the chat template from the tokenizer
? This is what we do for most models on HuggingChat and it works great see here
You mean, just providing the tokenizer
key in the .env.local
file instead of the chat template? I've tried that, and got the same result.
It really looks like Chat UI is sending this string ChatCompletionRequestMessageContentPartText(...)
as part of the prompt.
Could you share your VLLM config? will try to reproduce this locally
My vLLM is running on a k8s cluster via kserve, basically this. I don't have complete access to it, so I can't tell all the details.
But it turned out that this behavior with returning the assistant header also happens with other clients, like sending a simple request via curl. I'll try to run vLLM locally and further debug.
Thanks for the update! So if I understood correctly this is not an issue on the chat-ui side ? In that case i'll close this issue but if it turns out to be chat-ui specific let me know and I'll reopen
Bug description
I have set up a local endpoint serving Llama 3. All the answers I get from it start with
<|start_header_id|>assistant<|end_header_id|>
.Steps to reproduce
Set up Llama 3 in a local endpoint. In my
.env.local
, it is defined as the following:Context
I have tried variations of the chat template, also not providing any. The
<|start_header_id|>assistant<|end_header_id|>
is always there.AFAIK, these tokens should be the last ones in the prompt, so that the model knows that it should continue the prompt with the assistant's answer. It seems they are not properly appended to the prompt, but the model still realizes it should add them itself.
Logs
This a sample request that my local server receives (running VLLM):
Specs
Config
Notes
I'm not sure what the
ChatCompletionRequestMessageContentPartText(...)
in the prompt is supposed to mean. Is it some internal request object rendered as a string?