huggingface / text-generation-inference

Large Language Model Text Generation Inference
http://hf.co/docs/text-generation-inference
Apache License 2.0
8.91k stars 1.05k forks source link

Error "EOF while parsing an object..." with tool_calls #2145

Open ishelaputov opened 3 months ago

ishelaputov commented 3 months ago

System Info

Hello! Thank you very much for your product, very helpful!

System Info:

2024-06-30T00:30:49.387947Z  INFO text_generation_launcher: Runtime environment:
Target: x86_64-unknown-linux-gnu
Cargo version: 1.79.0
Commit sha: 192d49af0bfa71e886c27856232031f3935628ff
Docker label: sha-192d49a
nvidia-smi:
Sun Jun 30 00:30:47 2024       
   +-----------------------------------------------------------------------------------------+
   | NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
   |-----------------------------------------+------------------------+----------------------+
   | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
   | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
   |                                         |                        |               MIG M. |
   |=========================================+========================+======================|
   |   0  NVIDIA A100-SXM4-80GB          Off |   00000000:8B:00.0 Off |                    0 |
   | N/A   26C    P0             59W /  500W |       3MiB /  81920MiB |      0%      Default |
   |                                         |                        |             Disabled |
   +-----------------------------------------+------------------------+----------------------+
   |   1  NVIDIA A100-SXM4-80GB          Off |   00000000:8C:00.0 Off |                    0 |
   | N/A   29C    P0             62W /  500W |       3MiB /  81920MiB |      0%      Default |
   |                                         |                        |             Disabled |
   +-----------------------------------------+------------------------+----------------------+
   |   2  NVIDIA A100-SXM4-80GB          Off |   00000000:8D:00.0 Off |                    0 |
   | N/A   29C    P0             65W /  500W |       3MiB /  81920MiB |      0%      Default |
   |                                         |                        |             Disabled |
   +-----------------------------------------+------------------------+----------------------+
   |   3  NVIDIA A100-SXM4-80GB          Off |   00000000:8E:00.0 Off |                    0 |
   | N/A   28C    P0             60W /  500W |       3MiB /  81920MiB |      0%      Default |
   |                                         |                        |             Disabled |
   +-----------------------------------------+------------------------+----------------------+

   +-----------------------------------------------------------------------------------------+
   | Processes:                                                                              |
   |  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
   |        ID   ID                                                               Usage      |
   |=========================================================================================|
   |  No running processes found                                                             |
   +-----------------------------------------------------------------------------------------+
xpu-smi:
N/A
2024-06-30T00:30:49.387995Z  INFO text_generation_launcher: Args {
    model_id: "/meta-llama/Meta-Llama-3-8B-Instruct",
    revision: None,
    validation_workers: 2,
    sharded: None,
    num_shard: None,
    quantize: None,
    speculate: None,
    dtype: None,
    trust_remote_code: false,
    max_concurrent_requests: 128,
    max_best_of: 2,
    max_stop_sequences: 4,
    max_top_n_tokens: 50,
    max_input_tokens: Some(
        8191,
    ),
    max_input_length: None,
    max_total_tokens: Some(
        8192,
    ),
    waiting_served_ratio: 0.3,
    max_batch_prefill_tokens: Some(
        8242,
    ),
    max_batch_total_tokens: None,
    max_waiting_tokens: 20,
    max_batch_size: None,
    cuda_graphs: None,
    hostname: "48eb07d0d604",
    port: 80,
    shard_uds_path: "/tmp/text-generation-server",
    master_addr: "localhost",
    master_port: 29500,
    huggingface_hub_cache: Some(
        "/data",
    ),
    weights_cache_override: None,
    disable_custom_kernels: false,
    cuda_memory_fraction: 1.0,
    rope_scaling: None,
    rope_factor: None,
    json_output: false,
    otlp_endpoint: None,
    otlp_service_name: "text-generation-inference.router",
    cors_allow_origin: [],
    watermark_gamma: None,
    watermark_delta: None,
    ngrok: false,
    ngrok_authtoken: None,
    ngrok_edge: None,
    tokenizer_config_path: None,
    disable_grammar_support: false,
    env: true,
    max_client_batch_size: 4,
    lora_adapters: None,
}

Model info:

{
    "model_id": "/meta-llama/Meta-Llama-3-8B-Instruct",
    "model_sha": null,
    "model_dtype": "torch.float16",
    "model_device_type": "cuda",
    "model_pipeline_tag": null,
    "max_concurrent_requests": 128,
    "max_best_of": 2,
    "max_stop_sequences": 4,
    "max_input_tokens": 8191,
    "max_total_tokens": 8192,
    "waiting_served_ratio": 0.3,
    "max_batch_total_tokens": 451520,
    "max_waiting_tokens": 20,
    "max_batch_size": null,
    "validation_workers": 2,
    "max_client_batch_size": 4,
    "router": "text-generation-router",
    "version": "2.1.0",
    "sha": "192d49af0bfa71e886c27856232031f3935628ff",
    "docker_label": "sha-192d49a"
}

TGI Version: 2.1.0

Information

Tasks

Reproduction

When I execute the following query with the need to call a tool by the model:

curl --location 'http://10.146.240.74:30000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
    "messages": [
        {
            "content": "You are an assistant who can write the user'\''s last response to a file.\nDetermine the class name from the user description and use it as the name of the txt file, for example CreateIssues.txt.\nSave the file in the raw_data folder.\nRecord the content unchanged as provided by the user and nothing else.\nReturn only the path to the file, for example /raw_data/CreateIssues.txt. Work autonomously according to your specialty, using the tools available to you. Answer briefly and only in your specialty.",
            "role": "system"
        },
        {
            "role": "user",
            "content": "Analyze the content and write to file"
        },
        {
            "role": "user",
            "name": "controller_analizer",
            "content": "Controller '\''CreateIssuesController'\''\n\nМетоды:\n\nGET /api/jira/issues/createFromExcel\n\nНазначение метода: Метод массового создания задач в Jira из Excel файла.\n\nЗаголовки запроса:\nContent-Type: multipart/form-data\n\nВходные параметры:\nПараметр: file\n- Описание: xlsx файл с задачами, которые надо создать\n- Тип: MultipartFile\n- Обязательность: Да\n- Пример значение: файл.xlsx\n\nПример запроса:\nPOST /api/jira/issues/createFromExcel HTTP/1.1\nHost: example.com\nContent-Type: multipart/form-data; boundary=---------------------------1234567890\n\n-----------------------------1234567890\nContent-Disposition: form-data; name=\"file\"; filename=\"file.xlsx\"\nContent-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet\n\n... файл.xlsx...\n\n-----------------------------1234567890--\n\nВыходные параметры:\nПараметр: response\n- Описание: Список успешно созданных задач и список не созданных задач с описанием ошибок\n- Тип: JiraTaskCreateResponse\n- Обязательность: Да\n- Пример значение: {\"createdTasks\": [...], \"errors\": [...]}\n\nПример ответа:\nHTTP/1.1 201 Created\nContent-Type: application/json\n\n{\n  \"createdTasks\": [...],\n  \"errors\": [...]\n}\n\nКоды ответа:\n201 Created - успешное создание задач\n400 Bad Request - ошибка при создании задач"
        }
    ],
    "model": "/meta-llama/Meta-Llama-3-8B-Instruct",
    "max_tokens": 1024,
    "temperature": 0.01,
    "n": 50,
    "top_p": 0.9,
    "stream": false,
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "write_document",
                "description": "Create and save a text document. Return path of the saved document file.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "content": {
                            "description": "Text content to be written into the document.",
                            "type": "string"
                        },
                        "file_name": {
                            "description": "File path to save the document.",
                            "type": "string"
                        }
                    },
                    "required": [
                        "content",
                        "file_name"
                    ]
                }
            }
        }
    ],
    "tool_choice": "auto"
}'

I get the error:

{
    "error": "EOF while parsing an object at line 917 column 1",
    "error_type": "Input validation error"
}

If you call the same request with "stream": true, then this is the result: output_raw.txt output.txt

In the file output.txt all the values ​​of arguments are collected in one line and here’s what’s strange: 1) the JSON Schema of my and, as I understand it, default tool is added to the text for the content parameter my tool below 2) JSON Schema does not have the last closing character }

Expected behavior

Expected:

{
    "id": "",
    "object": "chat.completion",
    "created": 1719709113,
    "model": "/meta-llama/Meta-Llama-3-8B-Instruct",
    "system_fingerprint": "2.1.0-sha-192d49a",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "tool_calls": [
                    {
                        "id": "0",
                        "type": "function",
                        "function": {
                            "description": null,
                            "name": "write_document",
                            "arguments": {
                                "content": "Controller 'CreateIssuesController'\n\nМетоды:\n\nGET /api/jira/issues/createFromExcel\n\nНазначение метода: Метод массового создания задач в Jira из Excel файла.\n\nЗаголовки запроса:\nContent-Type: multipart/form-data\n\nВходные параметры:\nПараметр: file\n- Описание: xlsx файл с задачами, которые надо создать\n- Тип: MultipartFile\n- Обязательность: Да\n- Пример значение: файл.xlsx\n\nПример запроса:\nPOST /api/jira/issues/createFromExcel HTTP/1.1\nHost: example.com\nContent-Type: multipart/form-data; boundary=---------------------------1234567890\n\n-----------------------------1234567890\nContent-Disposition: form-data; name=\"file\"; filename=\"file.xlsx\"\nContent-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet\n\n... файл.xlsx...\n\n-----------------------------1234567890--\n\nВыходные параметры:\nПараметр: response\n- Описание: Список успешно созданных задач и список не созданных задач с описанием ошибок\n- Тип: JiraTaskCreateResponse\n- Обязательность: Да\n- Пример значение: {\"createdTasks\": [...], \"errors\": [...]}\n\nПример ответа:\nHTTP/1.1 201 Created\nContent-Type: application/json\n\n{\n  \"createdTasks\": [...],\n  \"errors\": [...]\n}\n\nКоды ответа:\n201 Created - успешное создание задач\n400 Bad Request - ошибка при создании задач",
                                "file_name": "/raw_data/CreateIssues.txt"
                            }
                        }
                    }
                ]
            },
            "logprobs": null,
            "finish_reason": "eos_token"
        }
    ],
    "usage": {
        "prompt_tokens": 647,
        "completion_tokens": 565,
        "total_tokens": 1212
    }
}

Thanks!

ishelaputov commented 3 months ago

Hello! The problem is some unescaped text characters that are passed to the input.

It helped to force the text to be converted to a json string, for example:

output_text = json.dumps(result["output_text"], ensure_ascii=False)

Example request:

curl --location 'http://10.146.240.74:30000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
    "messages": [
        {
            "content": "You are an assistant who can write the user'\''s last response to a file.\nDetermine the class name from the user description and use it as the name of the txt file, for example CreateIssues.txt.\nSave the file in the raw_data folder.\nRecord the content unchanged as provided by the user and nothing else.\nReturn only the path to the file, for example /raw_data/CreateIssues.txt. Work autonomously according to your specialty, using the tools available to you. Answer briefly and only in your specialty.",
            "role": "system"
        },
        {
            "role": "user",
            "content": "Analyze the content and write to file"
        },
        {
            "role": "user",
            "name": "controller_analizer",
            "content": "Описание контроллера '\''CreateIssuesController'\''\\nМетоды:\\nGET /api/jira/issues/createFromExcel\\nНазначение метода: Метод массового создания задач в Jira из Excel файла.\\nЗаголовки запроса:\\nContent-Type: multipart/form-data\\nВходные параметры:\\nПараметр: file\\n- Описание: xlsx файл с задачами, которые надо создать\\n- Тип: MultipartFile\\n- Обязательность: Да\\n- Пример значение: файл.xlsx\\nПример запроса:\\nPOST /api/jira/issues/createFromExcel HTTP/1.1\\nHost: example.com\\nContent-Type: multipart/form-data; boundary=---------------------------1234567890\\n-----------------------------1234567890\\nContent-Disposition: form-data; name=\"file\"; filename=\"file.xlsx\"\\nContent-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet\\n... файл.xlsx...\\n-----------------------------1234567890--\\nВыходные параметры:\\nПараметр: response\\n- Описание: Список успешно созданных задач и список не созданных задач с описанием ошибок\\n- Тип: JiraTaskCreateResponse\\n- Обязательность: Да\\n- Пример значение: {\\n\\t\"createdTasks\": [...],\\n\\t\"errors\": [...]}\\nПример ответа:\\nHTTP/1.1 201 Created\\nContent-Type: application/json\\n{\\n\\t\"createdTasks\": [...],\\n\\t\"errors\": [...]}\\nКоды ответа:\\n201 Created - успешное создание задач\\n400 Bad Request - ошибка при создании задач"
        }
    ],
    "model": "/meta-llama/Meta-Llama-3-8B-Instruct",
    "max_tokens": 1024,
    "temperature": 0.01,
    "n": 50,
    "top_p": 0.9,
    "stream": false,
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "write_document",
                "description": "Create and save a text document. Return path of the saved document file.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "content": {
                            "description": "Text content to be written into the document.",
                            "type": "string"
                        },
                        "file_name": {
                            "description": "File path to save the document.",
                            "type": "string"
                        }
                    },
                    "required": [
                        "content",
                        "file_name"
                    ]
                }
            }
        }
    ],
    "tool_choice": "auto"
}'

Response:

{
    "id": "",
    "object": "chat.completion",
    "created": 1719749467,
    "model": "/meta-llama/Meta-Llama-3-8B-Instruct",
    "system_fingerprint": "2.1.0-sha-192d49a",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "tool_calls": [
                    {
                        "id": "0",
                        "type": "function",
                        "function": {
                            "description": null,
                            "name": "write_document",
                            "arguments": {
                                "content": "Описание контроллера 'CreateIssuesController'\nМетоды:\nGET /api/jira/issues/createFromExcel\nНазначение метода: Метод массового создания задач в Jira из Excel файла.\nЗаголовки запроса:\nContent-Type: multipart/form-data\nВходные параметры:\nПараметр: file\n- Описание: xlsx файл с задачами, которые надо создать\n- Тип: MultipartFile\n- Обязательность: Да\n- Пример значение: файл.xlsx\nПример запроса:\nPOST /api/jira/issues/createFromExcel HTTP/1.1\nHost: example.com\nContent-Type: multipart/form-data; boundary=---------------------------1234567890\n-----------------------------1234567890\nContent-Disposition: form-data; name=\"file\"; filename=\"file.xlsx\"\nContent-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet\n... файл.xlsx...\n-----------------------------1234567890--\nВыходные параметры:\nПараметр: response\n- Описание: Список успешно созданных задач и список не созданных задач с описанием ошибок\n- Тип: JiraTaskCreateResponse\n- Обязательность: Да\n- Пример значение: {\n\t\"createdTasks\": [...],\n\t\"errors\": [...]}\nПример ответа:\nHTTP/1.1 201 Created\nContent-Type: application/json\n{\n\t\"createdTasks\": [...],\n\t\"errors\": [...]}\nКоды ответа:\n201 Created - успешное создание задач\n400 Bad Request - ошибка при создании задач",
                                "file_name": "/raw_data/CreateIssues.txt"
                            }
                        }
                    }
                ]
            },
            "logprobs": null,
            "finish_reason": "eos_token"
        }
    ],
    "usage": {
        "prompt_tokens": 695,
        "completion_tokens": 380,
        "total_tokens": 1075
    }
}
RonanKMcGovern commented 3 months ago

@ishelaputov are you just running that script on Llama 3 8B Instruct?

I tried to replicate your script and - it works when I use gpt-4 via openai - but when I use the latest version of TGI as an endpoint, it fails when I try do use tools (works fine without tools if I comment out tool_choice="auto")

ishelaputov commented 3 months ago

@RonanKMcGovern , yes, on Llama 3 8B Instruct and on Llama 3 70B Instruct. The behavior is the same as you described, if I remove tool_choice="auto", then it is executed. In LangChain I do it through ChatOpenAI - also an error.

RonanKMcGovern commented 3 months ago

Ah sorry, I'm confused. Let me see if I can re-iterate what you are saying.

  1. When you leave in tool_choice="auto" then there is this error:

    python llama3_raw.py
    Traceback (most recent call last):
    File "/Users/ronanmcgovern/TR/function-calling-v4/tests/llama3_raw.py", line 196, in <module>
    run_conversation()
    File "/Users/ronanmcgovern/TR/function-calling-v4/tests/llama3_raw.py", line 148, in run_conversation
    response = client.chat.completions.create(
    File "/Users/ronanmcgovern/TR/function-calling-v4/trelisEnv/lib/python3.10/site-packages/openai/_utils/_utils.py", line 277, in wrapper
    return func(*args, **kwargs)
    File "/Users/ronanmcgovern/TR/function-calling-v4/trelisEnv/lib/python3.10/site-packages/openai/resources/chat/completions.py", line 643, in create
    return self._post(
    File "/Users/ronanmcgovern/TR/function-calling-v4/trelisEnv/lib/python3.10/site-packages/openai/_base_client.py", line 1261, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
    File "/Users/ronanmcgovern/TR/function-calling-v4/trelisEnv/lib/python3.10/site-packages/openai/_base_client.py", line 942, in request
    return self._request(
    File "/Users/ronanmcgovern/TR/function-calling-v4/trelisEnv/lib/python3.10/site-packages/openai/_base_client.py", line 1026, in _request
    return self._retry_request(...

    Basically, some error in _base_client (why? I'm not sure)

  2. If I remove tool_choice="auto", then I simply get no tool being called:

    llama3_raw.py
    Response:  ChatCompletion(id='', choices=[Choice(finish_reason='length', index=0, logprobs=None, message=ChatCompletionMessage(content="San Francisco's weather is notoriously changeable, but I'll give you a general idea!\n\nSan Francisco is known for its mild and cool oceanic climate (Köppen climate type: Cfb). Here's what you can expect:\n\n1. **Cool summers:** Daytime temperatures usually range from 60°F (16°C) to 75°F (24°C), while nighttime temperatures can dip to around 50°F (10°C) to 60°F (16°C).\n2. **", role='assistant', function_call=None, tool_calls=None))], created=1720542803, model='Trelis/Meta-Llama-3-8B-Instruct-function-calling', object='chat.completion', service_tier=None, system_fingerprint='2.1.2-dev0-sha-4c976fb', usage=CompletionUsage(completion_tokens=100, prompt_tokens=19, total_tokens=119))
    ChatCompletionMessage(content="San Francisco's weather is notoriously changeable, but I'll give you a general idea!\n\nSan Francisco is known for its mild and cool oceanic climate (Köppen climate type: Cfb). Here's what you can expect:\n\n1. **Cool summers:** Daytime temperatures usually range from 60°F (16°C) to 75°F (24°C), while nighttime temperatures can dip to around 50°F (10°C) to 60°F (16°C).\n2. **", role='assistant', function_call=None, tool_calls=None)

    with the script:

    
    import os
    from dotenv import load_dotenv
    from openai import OpenAI

Load environment variables from the .env file in the parent directory

load_dotenv('../.env')

Get the API URL and model name from environment variables

openai_api_base = os.getenv('API_URL') + 'v1/' model_name = os.getenv('MODEL_NAME') api_key = os.getenv('API_KEY', default="EMPTY")

Initialize the OpenAI client

client = OpenAI( base_url=openai_api_base, api_key=api_key, )

def run_conversation():

Step 1: send the conversation and available functions to the model

messages = [{"role": "user", "content": "What's the weather like in San Francisco?"}]
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA",
                    },
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                },
                "required": ["location"],
            },
        },
    }
]
response = client.chat.completions.create(
    model=model_name,
    messages=messages,
    tools=tools,
    # tool_choice="auto",  # auto is default, but we'll be explicit
)
print("Response: ", response)
response_message = response.choices[0].message
print(response_message)

run_conversation()



Are we both saying the same thing? Are we in agreement?

On a related topic, have you any idea where I can find the code in this repo that describes a) how tools are injected into the prompt when the llm generates? b) how the response is parsed in order for the server to return the api response? I posted an issue on that [here](https://github.com/huggingface/text-generation-inference/issues/2210).

Many thanks.
ishelaputov commented 3 months ago

@RonanKMcGovern , hello!

To exclude the influence of the presence of any defects in the wrappers (openaAI or LangChain), I reproduced the error directly from Postman, above is an example of my request. There, by the way, you can see how tools are entered into the prompt for the LLM; they are located in the structure below, after the list of messages.

You and I are talking about the same thing, right. The error occurs if you specify tool_choice and there are some characters or something else in messages that causes EOF. If tool_choice is not specified at all, then there will be no choice of tool and the meaning of such a request is lost, but it will work without the EOF error.

Many thanks.

ishelaputov commented 3 months ago

@RonanKMcGovern , It seems I didn’t understand your question correctly, no, unfortunately, I don’t know how tools are inserted into the model access request. Your linked issue is helpful, thank you very much!

ishelaputov commented 2 months ago

Hello! Everything still plays on TGI 2.2.0