huggingface / text-generation-inference

Large Language Model Text Generation Inference
http://hf.co/docs/text-generation-inference
Apache License 2.0
8.83k stars 1.04k forks source link

Error With Tool Calling #2461

Open Archmilio opened 1 month ago

Archmilio commented 1 month ago

System Info

I am testing using the TGI Tool Call. But The error continues to occur, can you check it?

Information

Tasks

Reproduction

from openai import OpenAI
import json

from openai.types.chat import ChatCompletion, ChatCompletionMessageToolCall
from openai.types.chat.chat_completion import ChatCompletionMessage, Choice
from openai.types.completion_usage import CompletionUsage

client = OpenAI(base_url="http://0.0.0.0/v1", api_key="not-used")
MODEL_NAME = "Meta-Llama-3.1-8B-Instruct" 

# Define available function
weather_tool = {
    "type": "function",
    "function": {
        "name": "get_current_weather",
        "description": "Get the current weather",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA"
                },
                "format": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "The temperature unit to use. Infer this from the user's location."
                }
            },
            "required": ["location", "format"]
        }
    }
}

messages = [
    {       
        "role": "system",
        "content": f"[AVAILABLE_TOOLS] {json.dumps(weather_tool)} [/AVAILABLE_TOOLS]"
                    "You're a helpful assistant! Use tools if necessary, and reply in a JSON format",
    },
    {
        "role": "user", 
        "content": "Is it hot in Pittsburgh, PA right now? long answer please"
    }
]

chat_response = client.chat.completions.create(
    model=MODEL_NAME,
    messages=messages,
    tools=[weather_tool],
    tool_choice="auto",
    stream=False
)

assistant_message = chat_response.choices[0].message
messages.append(assistant_message)
#Expected `str` but got `dict` - serialized value may not be as expected
#Example output:
ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_abc123', function=Function(arguments='{"location": "Pittsburgh, PA", "format": "fahrenheit"}', name='get_current_weather'), type='function')])
tool_call_result = 88
tool_call_id = assistant_message.tool_calls[0].id
tool_function_name = assistant_message.tool_calls[0].function.name
messages.append({"role": "tool", "content": str(tool_call_result), "tool_call_id": tool_call_id, "name": tool_function_name})

chat_response = client.chat.completions.create(
    model=MODEL_NAME,
    messages=messages,
    tools=[weather_tool],
    tool_choice="auto",
    stream=False
)

assistant_message = chat_response.choices[0].message

print(chat_response)
# Example output:
# ChatCompletionMessage(content='Based on the current temperature of 88°F (31°C) in Pittsburgh, PA, it is indeed quite hot right now. This temperature is generally considered warm to hot, especially if accompanied by high humidity, which is common in Pittsburgh during summer months.', role='assistant', function_call=None, tool_calls=None)

Expected behavior

UnprocessableEntityError: Failed to deserialize the JSON body into the target type: messages[2].content: data did not match any variant of untagged enum MessageContent at line 1 column 675

ErikKaum commented 3 weeks ago

Hi @Archmilio 👋

I edited your original issue a bit to be able to get the code formatting. Hopefully that was okay.

Unfortunately I'm not able to reproduce your issue. When I deploy the model and call it with this code:

import json
from openai import OpenAI

from openai.types.chat import ChatCompletion, ChatCompletionMessageToolCall
from openai.types.chat.chat_completion import ChatCompletionMessage, Choice
from openai.types.completion_usage import CompletionUsage

client = OpenAI(
    base_url="MY_ENDPOINT", 
    api_key="" 
)

weather_tool = {
    "type": "function",
    "function": {
        "name": "get_current_weather",
        "description": "Get the current weather",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA"
                },
                "format": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "The temperature unit to use. Infer this from the user's location."
                }
            },
            "required": ["location", "format"]
        }
    }
}

messages = [
    {       
        "role": "system",
        "content": f"[AVAILABLE_TOOLS] {json.dumps(weather_tool)} [/AVAILABLE_TOOLS]"
                    "You're a helpful assistant! Use tools if necessary, and reply in a JSON format",
    },
    {
        "role": "user", 
        "content": "Is it hot in Pittsburgh, PA right now? long answer please"
    }
]

chat_response = client.chat.completions.create(
    model="Meta-Llama-3.1-8B-Instruct",
    messages=messages,
    tools=[weather_tool],
    tool_choice="auto",
    stream=False
)

assistant_message = chat_response.choices[0].message
messages.append(assistant_message)

print(assistant_message)

I get the result:

ChatCompletionMessage(content=None, refusal=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='0', function=Function(arguments={'format': 'fahrenheit', 'location': 'Pittsburgh, PA'}, name='get_current_weather', description=None), type='function')])

Or did I misunderstand your question?

Archmilio commented 2 weeks ago

Yes, it works normally until the part you reproduced. Function calling generally makes two requests to llm. As in the code above, request 1 works normally, but a json formatting error occurs in the process of sending the request to llm again by appending the result of request 1 and function execution.

**messages.append(assistant_message)**

ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_abc123', function=Function(arguments='{"location": "Pittsburgh, PA", "format": "fahrenheit"}', name='get_current_weather'), type='function')])
tool_call_result = 88
tool_call_id = assistant_message.tool_calls[0].id
tool_function_name = assistant_message.tool_calls[0].function.name
messages.append({"role": "tool", "content": str(tool_call_result), "tool_call_id": tool_call_id, "name": tool_function_name})

chat_response = client.chat.completions.create(
    model=MODEL_NAME,
    messages=messages,
    tools=[weather_tool],
    tool_choice="auto",
    stream=False
)

assistant_message = chat_response.choices[0].message

print(chat_response)

Expected output: ChatCompletionMessage(content='Based on the current temperature of 88°F (31°C) in Pittsburgh, PA, it is indeed quite hot right now. This temperature is generally considered warm to hot, especially if accompanied by high humidity, which is common in Pittsburgh during summer months.', role='assistant', function_call=None, tool_calls=None

Actual output: UnprocessableEntityError: Failed to deserialize the JSON body into the target type: messages[2].content: data did not match any variant of untagged enum MessageContent at line 1 column 675

venkats-nexusflow commented 1 week ago

Hello! I am facing the same issue here. Was anyone able to find a workaround for this by any chance?

kteppris commented 5 days ago

I guess this is related to #2480.

As meta describes, for passing back the ToolCall Message we need to use their new Role ipython:

https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_1

Maybe that is causing the issue.

jalalirs commented 12 hours ago

I am encountering the same issue when using the react agent. The agent executes the language model (LLM) twice, passing the tool descriptions in each. In a basic setup, the expected behavior is that the agent receives a function call in the first response and the final answer in the second response after the tool's answer is appended to the message list.

However, when the TGI detects a function call descriptor in the request, it enforces grammar interpretation and expects to handle the function call. This behavior causes a problem when a function call is not actually needed or intended, leading to one of the following issues:

  1. Agent Failure: The agent cannot process the response correctly and fails.
  2. Infinite Loop: TGI continuously returns function call responses, resulting in the agent getting stuck in an endless loop.

When I try to force TGI to bypass the function call and return a normal response (when no tool call is needed), it returns a notify_error.

Below is an example of the call and the problematic response behavior:

Request

{
    "model": "llama3",
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant.",
            "name": null
        },
       {
            "role": "user",
            "content": "For the following plan:\n1. Find the winner of the 2016 Australia Open\n2. Find the hometown of the winner\n\nYou are tasked with executing step 1, Find the winner of the 2016 Australia Open",
            "name": null
        },
        {
            "role": "assistant",
            "content": "",
            "name": null,
            "tool_calls": [
                {
                    "id": 0,
                    "type": "function",
                    "function": {
                        "name": "web_search",
                        "arguments": "{\"query\": \"2016 Australia Open winner\"}"

                    }
                }
            ]
        },
        {
           "role": "tool",
           "content": "The 2016 Australian Open was a tennis tournament that took place at Melbourne Park between 18 and 31 January 2016.[1] It was the 104th edition of the Australian Open, and the first Grand Slam tournament of the year. The tournament consisted of events for professional players in singles, doubles and mixed doubles play. Junior and wheelchair players competed in singles and doubles tournaments. Novak Djokovic successfully defended the men'\''s singles title and thus won a record-equaling sixth Australian Open title. Serena Williams was the defending champion in the women'\''s singles but failed to defend her title, losing to Angelique Kerber in the final; by winning, Kerber became the first German player of any gender to win a Grand Slam title since Steffi Graf won her last such title at the 1999 French Open.[2]",
           "tool_call_id": "0"
        }
    ],
    "temperature": 0.1,
    "stop": [
        "<|start_header_id|>",
        "<|end_header_id|>",
        "<|reserved_special_token|>",
        "<|eot_id|>"

    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "web_search",
                "description": "A search engine optimized for comprehensive, accurate, and trusted results.\nUseful for when you need to answer questions about current events.\nInput should be a search query.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "query": {
                            "description": "search query to look up",
                            "type": "string"
                        }
                    },
                    "required": [
                        "query"
                    ]
                }
            }
        }
    ],
    "tool_choice": "auto",
    "tool_prompt": "Please respond directly to the question unless using a function call provides significant clarity or concision. In cases where a function call is necessary, provide a JSON object specifying the function name and its required arguments, formatted as {name: '\''function_name'\'', parameters: {'\''argument1'\'': '\''value1'\'',...}}. Avoid unnecessary function calls and variable assignments"
}

TGI Response:

{
    "object": "chat.completion",
    "id": "",
    "created": 1727596257,
    "model": "/models/models--meta-llama--Meta-Llama-3.1-70B-instruct/",
    "system_fingerprint": "2.3.1-dev0-sha-169178b",
    "choices": [
      {
        "index": 0,
        "message": {
          "role": "assistant",
          "tool_calls": [
            {
              "id": "0",
              "type": "function",
              "function": {
                "description": null,
                "name": "notify_error",
                "arguments": {
                  "error": "The winner of the 2016 Australia Open is Novak Djokovic for the men"
                }
              }
            }
          ]
        },
        "logprobs": null,
        "finish_reason": "stop"
      }
    ],
    "usage": {
      "prompt_tokens": 443,
      "completion_tokens": 39,
      "total_tokens": 482
    }
  }