Closed lonngxiang closed 1 month ago
+1
This may result by the missing chat_template in tokenizer which is a bug fixed by transformers#32908.
Can you check if installing latest transformers
from source code would fix this issue?
@Isotr0py thanks for your reply. Now I used docker for running vllm, could you push a temp docker image for this issue?
I think you just need to add this line to the dockerfile:
RUN --mount=type=cache,target=/root/.cache/pip \
python3 -m pip install git+https://github.com/huggingface/transformers
Before line39-44:
# install build and runtime dependencies
COPY requirements-common.txt requirements-common.txt
COPY requirements-adag.txt requirements-adag.txt
COPY requirements-cuda.txt requirements-cuda.txt
RUN --mount=type=cache,target=/root/.cache/pip \
python3 -m pip install -r requirements-cuda.txt
Why does this not work out of the box? How does one specify such a template? Is it really necessary to work around this issue by using the transformers trunk?
EDIT: Here's some info I found: https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html?ref=blog.mozilla.ai#chat-template
@I321065 @simaotwx @lonngxiang I am using the k8s to deploy the LLM models via vLLM 2, and I am using mount a configMap to vLLM pod to fix my template issue , let me know if you more the full yaml file about how to deploy it
- name: ssdl-mistral-7b
image:vllm/vllm-openai:latest
command: ["/bin/sh", "-c"]
args: [
"vllm serve mistralai/Mistral-7B-v0.3 --chat-template /etc/chat-template-config/chat-template.j2 --trust-remote-code --enable-chunked-prefill --max_num_batched_tokens 1024"
]
env:
- name: HUGGING_FACE_HUB_TOKEN
valueFrom:
secretKeyRef:
name: hf-token-secret
key: token
- name: VLLM_NO_USAGE_STATS
value: "1"
- name: DO_NOT_TRACK
value: "1"
- name: PYTHONPATH
value: "/app/deps"
ports:
- containerPort: 8000
resources:
limits:
cpu: "10"
memory: 20G
nvidia.com/mig-3g.40gb: "1"
requests:
cpu: "2"
memory: 6G
nvidia.com/mig-3g.40gb: "1"
volumeMounts:
- mountPath: /root/.cache/huggingface
name: cache-volume
- name: shm
mountPath: /dev/shm
- name: config-volume
mountPath: /etc/config
- name: deps-volume
mountPath: /app/deps
- name: chat-template-volume
mountPath: /etc/chat-template-config
---
---
apiVersion: v1
kind: ConfigMap
metadata:
name: chat-template-config
namespace: ssdl-llm
data:
chat-template.j2: |
{%- if messages[0]["role"] == "system" %}
{%- set system_message = messages[0]["content"] %}
{%- set loop_messages = messages[1:] %}
{%- else %}
{%- set loop_messages = messages %}
{%- endif %}
{%- if not tools is defined %}
{%- set tools = none %}
{%- endif %}
{%- set user_messages = loop_messages | selectattr("role", "equalto", "user") | list %}
{%- for message in loop_messages | rejectattr("role", "equalto", "tool") | rejectattr("role", "equalto", "tool_results") | selectattr("tool_calls", "undefined") %}
{%- if (message["role"] == "user") != (loop.index0 % 2 == 0) %}
{{- raise_exception("After the optional system message, conversation roles must alternate user/assistant/user/assistant/...") }}
{%- endif %}
{%- endfor %}
{{- bos_token }}
{%- for message in loop_messages %}
{%- if message["role"] == "user" %}
{%- if tools is not none and (message == user_messages[-1]) %}
{{- "[AVAILABLE_TOOLS] [" }}
{%- for tool in tools %}
{%- set tool = tool.function %}
{{- '{"type": "function", "function": {' }}
{%- for key, val in tool.items() if key != "return" %}
{%- if val is string %}
{{- '"' + key + '": "' + val + '"' }}
{%- else %}
{{- '"' + key + '": ' + val|tojson }}
{%- endif %}
{%- if not loop.last %}
{{- ", " }}
{%- endif %}
{%- endfor %}
{{- "}}" }}
{%- if not loop.last %}
{{- ", " }}
{%- else %}
{{- "]" }}
{%- endif %}
{%- endfor %}
{{- "[/AVAILABLE_TOOLS]" }}
{%- endif %}
{%- if loop.last and system_message is defined %}
{{- "[INST] " + system_message + "\n\n" + message["content"] + "[/INST]" }}
{%- else %}
{{- "[INST] " + message["content"] + "[/INST]" }}
{%- endif %}
{%- elif message["role"] == "tool_calls" or message.tool_calls is defined %}
{%- if message.tool_calls is defined %}
{%- set tool_calls = message.tool_calls %}
{%- else %}
{%- set tool_calls = message.content %}
{%- endif %}
{{- "[TOOL_CALLS] [" }}
{%- for tool_call in tool_calls %}
{%- set out = tool_call.function|tojson %}
{{- out[:-1] }}
{%- if not tool_call.id is defined or tool_call.id|length < 9 %}
{{- raise_exception("Tool call IDs should be alphanumeric strings with length >= 9! (1)" + tool_call.id) }}
{%- endif %}
{{- ', "id": "' + tool_call.id[-9:] + '"}' }}
{%- if not loop.last %}
{{- ", " }}
{%- else %}
{{- "]" + eos_token }}
{%- endif %}
{%- endfor %}
{%- elif message["role"] == "assistant" %}
{{- " " + message["content"] + eos_token }}
{%- elif message["role"] == "tool_results" or message["role"] == "tool" %}
{%- if message.content is defined and message.content.content is defined %}
{%- set content = message.content.content %}
{%- else %}
{%- set content = message.content %}
{%- endif %}
{{- '[TOOL_RESULTS] {"content": ' + content|string + ", " }}
{%- if not message.tool_call_id is defined or message.tool_call_id|length < 9 %}
{{- raise_exception("Tool call IDs should be alphanumeric strings with length >= 9! (2)" + message.tool_call_id) }}
{%- endif %}
{{- '"call_id": "' + message.tool_call_id[-9:] + '"}[/TOOL_RESULTS]' }}
{%- else %}
{{- raise_exception("Only user and assistant roles are supported, with the exception of an initial optional system message!") }}
{%- endif %}
{%- endfor %}
Hi same issue for me. I am trying vllm with facebook/opt-125m using openai template, can someone help ? ValueError: As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.
Hi same issue for me. I am trying vllm with facebook/opt-125m using openai template, can someone help ? ValueError: As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.你好,对我来说同样的问题。我正在尝试使用 openai 模板通过 facebook/opt-125m 进行 vllm,有人可以帮忙吗? ValueError:从 Transformer v4.44 开始,不再允许使用默认聊天模板,因此如果标记生成器未定义聊天模板,则必须提供聊天模板。
A solution: https://blog.csdn.net/yuanlulu/article/details/142929234
Your current environment
BadRequestError: Error code: 400 - {'object': 'error', 'message': 'As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.', 'type': 'BadRequestError', 'param': None, 'code': 400}
How would you like to use vllm
CUDA_VISIBLE_DEVICES=1 vllm serve /ai/qwen1.5-1.8b.gguf --host 0.0.0.0 --port 10868 --max-model-len 4096 --trust-remote-code --tensor-parallel-size 1 --dtype=half --quantization gguf --load-format gguf
Before submitting a new issue...