[Usage]: run gguf model need template，how to write？

lonngxiang commented 2 months ago

Your current environment

BadRequestError: Error code: 400 - {'object': 'error', 'message': 'As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.', 'type': 'BadRequestError', 'param': None, 'code': 400}

How would you like to use vllm

CUDA_VISIBLE_DEVICES=1 vllm serve /ai/qwen1.5-1.8b.gguf --host 0.0.0.0 --port 10868 --max-model-len 4096 --trust-remote-code --tensor-parallel-size 1 --dtype=half --quantization gguf --load-format gguf

Before submitting a new issue...

[X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

I321065 commented 2 months ago

+1

Isotr0py commented 2 months ago

This may result by the missing chat_template in tokenizer which is a bug fixed by transformers#32908. Can you check if installing latest transformers from source code would fix this issue?

I321065 commented 2 months ago

@Isotr0py thanks for your reply. Now I used docker for running vllm, could you push a temp docker image for this issue?

Isotr0py commented 2 months ago

I think you just need to add this line to the dockerfile:

RUN --mount=type=cache,target=/root/.cache/pip \
    python3 -m pip install git+https://github.com/huggingface/transformers

Before line39-44:

# install build and runtime dependencies
COPY requirements-common.txt requirements-common.txt
COPY requirements-adag.txt requirements-adag.txt
COPY requirements-cuda.txt requirements-cuda.txt
RUN --mount=type=cache,target=/root/.cache/pip \
    python3 -m pip install -r requirements-cuda.txt

simaotwx commented 1 month ago

Why does this not work out of the box? How does one specify such a template? Is it really necessary to work around this issue by using the transformers trunk?

EDIT: Here's some info I found: https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html?ref=blog.mozilla.ai#chat-template

haitwang-cloud commented 1 month ago

@I321065 @simaotwx @lonngxiang I am using the k8s to deploy the LLM models via vLLM 2, and I am using mount a configMap to vLLM pod to fix my template issue , let me know if you more the full yaml file about how to deploy it

      - name: ssdl-mistral-7b
        image:vllm/vllm-openai:latest
        command: ["/bin/sh", "-c"]
        args: [
          "vllm serve mistralai/Mistral-7B-v0.3 --chat-template /etc/chat-template-config/chat-template.j2 --trust-remote-code --enable-chunked-prefill --max_num_batched_tokens 1024"
        ]
        env:
        - name: HUGGING_FACE_HUB_TOKEN
          valueFrom:
            secretKeyRef:
              name: hf-token-secret
              key: token
        - name: VLLM_NO_USAGE_STATS
          value: "1"
        - name: DO_NOT_TRACK
          value: "1"
        - name: PYTHONPATH
          value: "/app/deps"
        ports:
        - containerPort: 8000
        resources:
          limits:
            cpu: "10"
            memory: 20G
            nvidia.com/mig-3g.40gb: "1"
          requests:
            cpu: "2"
            memory: 6G
            nvidia.com/mig-3g.40gb: "1"
        volumeMounts:
        - mountPath: /root/.cache/huggingface
          name: cache-volume
        - name: shm
          mountPath: /dev/shm
        - name: config-volume
          mountPath: /etc/config
        - name: deps-volume
          mountPath: /app/deps
        - name: chat-template-volume
          mountPath: /etc/chat-template-config
--- 
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: chat-template-config
  namespace: ssdl-llm
data:
  chat-template.j2: |
    {%- if messages[0]["role"] == "system" %}
        {%- set system_message = messages[0]["content"] %}
        {%- set loop_messages = messages[1:] %}
    {%- else %}
        {%- set loop_messages = messages %}
    {%- endif %}
    {%- if not tools is defined %}
        {%- set tools = none %}
    {%- endif %}
    {%- set user_messages = loop_messages | selectattr("role", "equalto", "user") | list %}

    {%- for message in loop_messages | rejectattr("role", "equalto", "tool") | rejectattr("role", "equalto", "tool_results") | selectattr("tool_calls", "undefined") %}
        {%- if (message["role"] == "user") != (loop.index0 % 2 == 0) %}
            {{- raise_exception("After the optional system message, conversation roles must alternate user/assistant/user/assistant/...") }}
        {%- endif %}
    {%- endfor %}

    {{- bos_token }}
    {%- for message in loop_messages %}
        {%- if message["role"] == "user" %}
            {%- if tools is not none and (message == user_messages[-1]) %}
                {{- "[AVAILABLE_TOOLS] [" }}
                {%- for tool in tools %}
                    {%- set tool = tool.function %}
                    {{- '{"type": "function", "function": {' }}
                    {%- for key, val in tool.items() if key != "return" %}
                        {%- if val is string %}
                            {{- '"' + key + '": "' + val + '"' }}
                        {%- else %}
                            {{- '"' + key + '": ' + val|tojson }}
                        {%- endif %}
                        {%- if not loop.last %}
                            {{- ", " }}
                        {%- endif %}
                    {%- endfor %}
                    {{- "}}" }}
                    {%- if not loop.last %}
                        {{- ", " }}
                    {%- else %}
                        {{- "]" }}
                    {%- endif %}
                {%- endfor %}
                {{- "[/AVAILABLE_TOOLS]" }}
            {%- endif %}
            {%- if loop.last and system_message is defined %}
                {{- "[INST] " + system_message + "\n\n" + message["content"] + "[/INST]" }}
            {%- else %}
                {{- "[INST] " + message["content"] + "[/INST]" }}
            {%- endif %}
        {%- elif message["role"] == "tool_calls" or message.tool_calls is defined %}
            {%- if message.tool_calls is defined %}
                {%- set tool_calls = message.tool_calls %}
            {%- else %}
                {%- set tool_calls = message.content %}
            {%- endif %}
            {{- "[TOOL_CALLS] [" }}
            {%- for tool_call in tool_calls %}
                {%- set out = tool_call.function|tojson %}
                {{- out[:-1] }}
                {%- if not tool_call.id is defined or tool_call.id|length < 9 %}
                    {{- raise_exception("Tool call IDs should be alphanumeric strings with length >= 9! (1)" + tool_call.id) }}
                {%- endif %}
                {{- ', "id": "' + tool_call.id[-9:] + '"}' }}
                {%- if not loop.last %}
                    {{- ", " }}
                {%- else %}
                    {{- "]" + eos_token }}
                {%- endif %}
            {%- endfor %}
        {%- elif message["role"] == "assistant" %}
            {{- " " + message["content"] + eos_token }}
        {%- elif message["role"] == "tool_results" or message["role"] == "tool" %}
            {%- if message.content is defined and message.content.content is defined %}
                {%- set content = message.content.content %}
            {%- else %}
                {%- set content = message.content %}
            {%- endif %}
            {{- '[TOOL_RESULTS] {"content": ' + content|string + ", " }}
            {%- if not message.tool_call_id is defined or message.tool_call_id|length < 9 %}
                {{- raise_exception("Tool call IDs should be alphanumeric strings with length >= 9! (2)" + message.tool_call_id) }}
            {%- endif %}
            {{- '"call_id": "' + message.tool_call_id[-9:] + '"}[/TOOL_RESULTS]' }}
        {%- else %}
            {{- raise_exception("Only user and assistant roles are supported, with the exception of an initial optional system message!") }}
        {%- endif %}
    {%- endfor %}

Mahamadoulng commented 2 weeks ago

Hi same issue for me. I am trying vllm with facebook/opt-125m using openai template, can someone help ? ValueError: As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.

byerose commented 1 week ago

Hi same issue for me. I am trying vllm with facebook/opt-125m using openai template, can someone help ? ValueError: As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.你好，对我来说同样的问题。我正在尝试使用 openai 模板通过 facebook/opt-125m 进行 vllm，有人可以帮忙吗？ ValueError：从 Transformer v4.44 开始，不再允许使用默认聊天模板，因此如果标记生成器未定义聊天模板，则必须提供聊天模板。

A solution: https://blog.csdn.net/yuanlulu/article/details/142929234

vllm-project / vllm