Closed binarycrayon closed 5 months ago
The model repo contains:
special_tokens_map.json
tokenizer.json
tokenizer.model
tokenizer_config.json
I'm experiencing the same with deepseek-ai/deepseek-coder-33b-instruct
and mistralai/Mixtral-8x7B-Instruct-v0.1
models (those are the only models I tried with 1.4.0
and latest
).
I checked tokenizer_config.json
to make sure that chat_template
is set. Both models have that set.
I noticed that there was recently a fix around picking up tokenizer_config.json
locally. That didn't affect this error.
Same for me. Steps to reproduce.
model=TheBloke/Llama-2-7B-Chat-GPTQ
volume=$PWD/data
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.4 --model-id $model --quantize gptq
curl http://localhost:8080/generate \
-X POST \
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
-H 'Content-Type: application/json'
That works
curl -N http://localhost:8080/v1/chat/completions -H "content-type: application/json" -d '{
"model": "TheBloke/Llama-2-7B-Chat-GPTQ",
"messages": [{"role": "user", "content": "Give me some tips on writing job postings"}]}'
Fails with template error
Looks like TGI needs the template to squash the chat history. https://github.com/huggingface/text-generation-inference/blob/main/router/src/infer.rs#L94
Does anyone know how to provide the template?
Looks like something like this is needed https://huggingface.co/docs/transformers/chat_templating
So I found in mixtral https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1/blob/main/tokenizer_config.json
With the section chat_template
so if you don't have that section in your models tokenizer_config.json
then I guess the Open AI enpoint is not going to work.
@9876691 I agree that that error would make sense if chat_template
was missing, but
I checked
tokenizer_config.json
to make sure thatchat_template
is set. Both models have that set.
I don't think that's what's causing the problem for me at least.
hi @adinin, I believe the issue is related to the type of the bos_token
and eos_token
in the tokenizer_config.json
. Currently TGI expects the tokens to be of type string but in some cases the config has a more complex type. There is an open PR that should be resolve this issue when https://github.com/huggingface/text-generation-inference/pull/1550 is merged
hi @adinin, I believe the issue is related to the type of the
bos_token
andeos_token
in thetokenizer_config.json
. Currently TGI expects the tokens to be of type string but in some cases the config has a more complex type. There is an open PR that should be resolve this issue when #1550 is merged
If that's the case, this issue and issue #1534 are duplicates.
Thanks for putting in the fix @drbh. I'm looking forward to the update.
Same issue, I am getting the following error while using llama-2-chat-hf model
text_generation_router::server: router/src/server.rs:585: Template error: invalid operation: object has no method named strip (in
Same issue, I am getting the following error while using llama-2-chat-hf model text_generation_router::server: router/src/server.rs:585: Template error: invalid operation: object has no method named strip (in :1)
I'm also experiencing this issue. You can pass in your own token config via the command line argument .
--tokenizer-config-path <TOKENIZER_CONFIG_PATH>
The path to the tokenizer config file. This path is used to load the tokenizer configuration which may include a `chat_template`. If not provided, the default config will be used from the model hub [env: TOKENIZER_CONFIG_PATH=]
then copy the default token_config.json from the model and replace the chat_template with the below. Make sure you strip your message content
yourself when calling it though. This is a workaround until they fix it.
"chat_template": "{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% else %}{% set loop_messages = messages %}{% set system_message = false %}{% endif %}{% for message in loop_messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if loop.index0 == 0 and system_message != false %}{% set content = '<<SYS>>\n' + system_message + '\n<</SYS>>\n\n' + message['content'] %}{% else %}{% set content = message['content'] %}{% endif %}{% if message['role'] == 'user' %}{{ bos_token + '[INST] ' + content + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ ' ' + content + ' ' + eos_token }}{% endif %}{% endfor %}"
@gabewillen So, we must manually strip out the input message before feeding it to the OpenAI client?
@gabewillen So, we must manually strip out the input message before feeding it to the OpenAI client?
@vibhorag101 Just remove the whitespace as that's what was being done in the template and what was causing the failure. so just make sure you do
message = {
"role": "user",
"content": content.strip()
}
That will make sure the template isn't affected
Also ensure your first message after the optional system message has a "user" role and that they alternate between "user" and "assistant".
same issue here with Mistral-7B-awq
same issue here with llama
same issue here with Mistral-7B-awq
In my case, this problem occurred because I was using the "non-instruction" version of the model. It's important to have the "chat_template" section in the tokenizer_config.json file.
https://huggingface.co/TheBloke/Mistral-7B-v0.1-AWQ/blob/main/tokenizer_config.json https://huggingface.co/TheBloke/Mistral-7B-Merge-14-v0.1-AWQ/blob/main/tokenizer_config.json
For now, the fix suggested by @gabewillen works well for me. But I think this issue is still not fixed yet in the project.
Hi @vibhorag101 the issue is likely due to the .strip()
method which is not supported by TGI at the moment. TGI currently strictly supports the jinja spec which uses | trim
instead of .strip()
. Many templates on the hub follow this syntax but are some still include .strip
and other non jinja methods.
We're exploring adding an internal workaround but currently the fastest solutions is to copy the file locally and replace the strip with | trim
as well as opening a PR on the HuggingFace Hub on the models that use non jinja syntax.
Experiencing the same problem with Qwn 72B running with --quantize=bitsandbytes-nf4
https://huggingface.co/Qwen/Qwen1.5-72B-Chat/blob/main/tokenizer_config.json
Hi @gabewillen Thanks for the tips above. Mine has the chat_template
https://huggingface.co/TheBloke/openchat-3.5-0106-AWQ/blob/main/tokenizer_config.json#L51
But it raises a different error
openai.UnprocessableEntityError: Error code: 422 - {'error': 'Template error: invalid operation: object has no method named title (in <string>:1)', 'error_type': 'template_error'}
I'm on version 1.4.3
hi @binarycrayon have you been able to resolve this issue by ensuring that the tokenizer_config.json
contains a valid chat_template
?
Regarding others who are having issues. The template error messages contain information about the specific issue.
For example in the error shared above, the error says object has no method named title
which indicated that the title
method in the chat template is not valid.
As noted above the TGI strictly uses standard jinja see spec here
Please make sure that the model you are loading is using standard jinja. If a model is not following the standard, I encourage opening PR's on the hub (which will help others running into these issues) like this one. Additionally you can load the model locally and update the chat_template
on your machine to resolve the issue.
TLDR;
please update the template to use standard jinja
.title()
-> |title
.strip()
-> |trim
On Qwen 72B there is nothing wrong with the chat_template https://huggingface.co/PNU-Infosec/Qwen1.5-72B-Chat/blob/main/tokenizer_config.json
Yet i still got template not found
2024-03-27T09:12:10.590140Z WARN text_generation_router: router/src/main.rs:343: Invalid hostname, defaulting to 0.0.0.0
2024-03-27T09:13:19.203101Z ERROR chat_completions: text_generation_router::server: router/src/server.rs:773: Template error: template not found
2024-03-27T09:13:19.236534Z ERROR chat_completions: text_generation_router::server: router/src/server.rs:773: Template error: template not found
I'm using latest TGI docker 1.4.4
I was getting this issue before as well running a Llama2 chat variant a couple weeks ago. I pulled the latest TGI server Docker image (as of a couple weeks ago), which cleared up my issue, so maybe a low-effort potential solution for some folks to try.
Same issue, I am getting the following error while using llama-2-chat-hf model text_generation_router::server: router/src/server.rs:585: Template error: invalid operation: object has no method named strip (in :1)
I'm also experiencing this issue. You can pass in your own token config via the command line argument .
--tokenizer-config-path <TOKENIZER_CONFIG_PATH> The path to the tokenizer config file. This path is used to load the tokenizer configuration which may include a `chat_template`. If not provided, the default config will be used from the model hub [env: TOKENIZER_CONFIG_PATH=]
then copy the default token_config.json from the model and replace the chat_template with the below. Make sure you strip your message
content
yourself when calling it though. This is a workaround until they fix it."chat_template": "{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% else %}{% set loop_messages = messages %}{% set system_message = false %}{% endif %}{% for message in loop_messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if loop.index0 == 0 and system_message != false %}{% set content = '<<SYS>>\n' + system_message + '\n<</SYS>>\n\n' + message['content'] %}{% else %}{% set content = message['content'] %}{% endif %}{% if message['role'] == 'user' %}{{ bos_token + '[INST] ' + content + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ ' ' + content + ' ' + eos_token }}{% endif %}{% endfor %}"
OMG Thanks so much! I couldn't find a way to use system prompt with Mixtral8X7B and vLLM. This works like a charm.
I just modified the chat template in the tokenizer_config.json... It works A1 It's crazy that there is almost no mention of this hack on the web
System Info
Docker Image: ghcr.io/huggingface/text-generation-inference:sha-1734540 Instance: AWS A10G via Huggingface Interfence Endpoint
Information
Tasks
Reproduction
On Huggingface Inference Endpoint:
Served a finetuned model: JamAndTeaStudios/dialogue-choice-merged-01-30-sft-mistral-7b-instruct-0.2 Above model is a peft fine-tuned and merged Mistral 7B model, including tokenizer Task: Text Generation TGI Docker Image: ghcr.io/huggingface/text-generation-inference:sha-1734540 Instance: AWS GPU A10G in east region
Expected behavior
Once the inference url is up and running, I followed https://huggingface.co/blog/tgi-messages-api and configured Openai client with the URL then call the chat completion endpoint
saw 402 error:
log from TGI endpoint:
I expected the endpoint to just work. I wonder what caused token file not found and what I should do to help with that