vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
29.59k stars 4.46k forks source link

[Usage]: run gguf model need template,how to write? #7978

Closed lonngxiang closed 1 month ago

lonngxiang commented 2 months ago

Your current environment

BadRequestError: Error code: 400 - {'object': 'error', 'message': 'As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.', 'type': 'BadRequestError', 'param': None, 'code': 400}

How would you like to use vllm

CUDA_VISIBLE_DEVICES=1 vllm serve /ai/qwen1.5-1.8b.gguf --host 0.0.0.0 --port 10868 --max-model-len 4096 --trust-remote-code --tensor-parallel-size 1 --dtype=half --quantization gguf --load-format gguf

Before submitting a new issue...

I321065 commented 2 months ago

+1

Isotr0py commented 2 months ago

This may result by the missing chat_template in tokenizer which is a bug fixed by transformers#32908. Can you check if installing latest transformers from source code would fix this issue?

I321065 commented 2 months ago

@Isotr0py thanks for your reply. Now I used docker for running vllm, could you push a temp docker image for this issue?

Isotr0py commented 2 months ago

I think you just need to add this line to the dockerfile:

RUN --mount=type=cache,target=/root/.cache/pip \
    python3 -m pip install git+https://github.com/huggingface/transformers

Before line39-44:

# install build and runtime dependencies
COPY requirements-common.txt requirements-common.txt
COPY requirements-adag.txt requirements-adag.txt
COPY requirements-cuda.txt requirements-cuda.txt
RUN --mount=type=cache,target=/root/.cache/pip \
    python3 -m pip install -r requirements-cuda.txt
simaotwx commented 1 month ago

Why does this not work out of the box? How does one specify such a template? Is it really necessary to work around this issue by using the transformers trunk?

EDIT: Here's some info I found: https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html?ref=blog.mozilla.ai#chat-template

haitwang-cloud commented 1 month ago

@I321065 @simaotwx @lonngxiang I am using the k8s to deploy the LLM models via vLLM 2, and I am using mount a configMap to vLLM pod to fix my template issue , let me know if you more the full yaml file about how to deploy it

      - name: ssdl-mistral-7b
        image:vllm/vllm-openai:latest
        command: ["/bin/sh", "-c"]
        args: [
          "vllm serve mistralai/Mistral-7B-v0.3 --chat-template /etc/chat-template-config/chat-template.j2 --trust-remote-code --enable-chunked-prefill --max_num_batched_tokens 1024"
        ]
        env:
        - name: HUGGING_FACE_HUB_TOKEN
          valueFrom:
            secretKeyRef:
              name: hf-token-secret
              key: token
        - name: VLLM_NO_USAGE_STATS
          value: "1"
        - name: DO_NOT_TRACK
          value: "1"
        - name: PYTHONPATH
          value: "/app/deps"
        ports:
        - containerPort: 8000
        resources:
          limits:
            cpu: "10"
            memory: 20G
            nvidia.com/mig-3g.40gb: "1"
          requests:
            cpu: "2"
            memory: 6G
            nvidia.com/mig-3g.40gb: "1"
        volumeMounts:
        - mountPath: /root/.cache/huggingface
          name: cache-volume
        - name: shm
          mountPath: /dev/shm
        - name: config-volume
          mountPath: /etc/config
        - name: deps-volume
          mountPath: /app/deps
        - name: chat-template-volume
          mountPath: /etc/chat-template-config
--- 
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: chat-template-config
  namespace: ssdl-llm
data:
  chat-template.j2: |
    {%- if messages[0]["role"] == "system" %}
        {%- set system_message = messages[0]["content"] %}
        {%- set loop_messages = messages[1:] %}
    {%- else %}
        {%- set loop_messages = messages %}
    {%- endif %}
    {%- if not tools is defined %}
        {%- set tools = none %}
    {%- endif %}
    {%- set user_messages = loop_messages | selectattr("role", "equalto", "user") | list %}

    {%- for message in loop_messages | rejectattr("role", "equalto", "tool") | rejectattr("role", "equalto", "tool_results") | selectattr("tool_calls", "undefined") %}
        {%- if (message["role"] == "user") != (loop.index0 % 2 == 0) %}
            {{- raise_exception("After the optional system message, conversation roles must alternate user/assistant/user/assistant/...") }}
        {%- endif %}
    {%- endfor %}

    {{- bos_token }}
    {%- for message in loop_messages %}
        {%- if message["role"] == "user" %}
            {%- if tools is not none and (message == user_messages[-1]) %}
                {{- "[AVAILABLE_TOOLS] [" }}
                {%- for tool in tools %}
                    {%- set tool = tool.function %}
                    {{- '{"type": "function", "function": {' }}
                    {%- for key, val in tool.items() if key != "return" %}
                        {%- if val is string %}
                            {{- '"' + key + '": "' + val + '"' }}
                        {%- else %}
                            {{- '"' + key + '": ' + val|tojson }}
                        {%- endif %}
                        {%- if not loop.last %}
                            {{- ", " }}
                        {%- endif %}
                    {%- endfor %}
                    {{- "}}" }}
                    {%- if not loop.last %}
                        {{- ", " }}
                    {%- else %}
                        {{- "]" }}
                    {%- endif %}
                {%- endfor %}
                {{- "[/AVAILABLE_TOOLS]" }}
            {%- endif %}
            {%- if loop.last and system_message is defined %}
                {{- "[INST] " + system_message + "\n\n" + message["content"] + "[/INST]" }}
            {%- else %}
                {{- "[INST] " + message["content"] + "[/INST]" }}
            {%- endif %}
        {%- elif message["role"] == "tool_calls" or message.tool_calls is defined %}
            {%- if message.tool_calls is defined %}
                {%- set tool_calls = message.tool_calls %}
            {%- else %}
                {%- set tool_calls = message.content %}
            {%- endif %}
            {{- "[TOOL_CALLS] [" }}
            {%- for tool_call in tool_calls %}
                {%- set out = tool_call.function|tojson %}
                {{- out[:-1] }}
                {%- if not tool_call.id is defined or tool_call.id|length < 9 %}
                    {{- raise_exception("Tool call IDs should be alphanumeric strings with length >= 9! (1)" + tool_call.id) }}
                {%- endif %}
                {{- ', "id": "' + tool_call.id[-9:] + '"}' }}
                {%- if not loop.last %}
                    {{- ", " }}
                {%- else %}
                    {{- "]" + eos_token }}
                {%- endif %}
            {%- endfor %}
        {%- elif message["role"] == "assistant" %}
            {{- " " + message["content"] + eos_token }}
        {%- elif message["role"] == "tool_results" or message["role"] == "tool" %}
            {%- if message.content is defined and message.content.content is defined %}
                {%- set content = message.content.content %}
            {%- else %}
                {%- set content = message.content %}
            {%- endif %}
            {{- '[TOOL_RESULTS] {"content": ' + content|string + ", " }}
            {%- if not message.tool_call_id is defined or message.tool_call_id|length < 9 %}
                {{- raise_exception("Tool call IDs should be alphanumeric strings with length >= 9! (2)" + message.tool_call_id) }}
            {%- endif %}
            {{- '"call_id": "' + message.tool_call_id[-9:] + '"}[/TOOL_RESULTS]' }}
        {%- else %}
            {{- raise_exception("Only user and assistant roles are supported, with the exception of an initial optional system message!") }}
        {%- endif %}
    {%- endfor %}
Mahamadoulng commented 2 weeks ago

Hi same issue for me. I am trying vllm with facebook/opt-125m using openai template, can someone help ? ValueError: As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.

byerose commented 1 week ago

Hi same issue for me. I am trying vllm with facebook/opt-125m using openai template, can someone help ? ValueError: As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.你好,对我来说同样的问题。我正在尝试使用 openai 模板通过 facebook/opt-125m 进行 vllm,有人可以帮忙吗? ValueError:从 Transformer v4.44 开始,不再允许使用默认聊天模板,因此如果标记生成器未定义聊天模板,则必须提供聊天模板。

A solution: https://blog.csdn.net/yuanlulu/article/details/142929234