Open Archmilio opened 1 month ago
Hi @Archmilio 👋
I edited your original issue a bit to be able to get the code formatting. Hopefully that was okay.
Unfortunately I'm not able to reproduce your issue. When I deploy the model and call it with this code:
import json
from openai import OpenAI
from openai.types.chat import ChatCompletion, ChatCompletionMessageToolCall
from openai.types.chat.chat_completion import ChatCompletionMessage, Choice
from openai.types.completion_usage import CompletionUsage
client = OpenAI(
base_url="MY_ENDPOINT",
api_key=""
)
weather_tool = {
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location."
}
},
"required": ["location", "format"]
}
}
}
messages = [
{
"role": "system",
"content": f"[AVAILABLE_TOOLS] {json.dumps(weather_tool)} [/AVAILABLE_TOOLS]"
"You're a helpful assistant! Use tools if necessary, and reply in a JSON format",
},
{
"role": "user",
"content": "Is it hot in Pittsburgh, PA right now? long answer please"
}
]
chat_response = client.chat.completions.create(
model="Meta-Llama-3.1-8B-Instruct",
messages=messages,
tools=[weather_tool],
tool_choice="auto",
stream=False
)
assistant_message = chat_response.choices[0].message
messages.append(assistant_message)
print(assistant_message)
I get the result:
ChatCompletionMessage(content=None, refusal=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='0', function=Function(arguments={'format': 'fahrenheit', 'location': 'Pittsburgh, PA'}, name='get_current_weather', description=None), type='function')])
Or did I misunderstand your question?
Yes, it works normally until the part you reproduced. Function calling generally makes two requests to llm. As in the code above, request 1 works normally, but a json formatting error occurs in the process of sending the request to llm again by appending the result of request 1 and function execution.
**messages.append(assistant_message)**
ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_abc123', function=Function(arguments='{"location": "Pittsburgh, PA", "format": "fahrenheit"}', name='get_current_weather'), type='function')])
tool_call_result = 88
tool_call_id = assistant_message.tool_calls[0].id
tool_function_name = assistant_message.tool_calls[0].function.name
messages.append({"role": "tool", "content": str(tool_call_result), "tool_call_id": tool_call_id, "name": tool_function_name})
chat_response = client.chat.completions.create(
model=MODEL_NAME,
messages=messages,
tools=[weather_tool],
tool_choice="auto",
stream=False
)
assistant_message = chat_response.choices[0].message
print(chat_response)
Expected output: ChatCompletionMessage(content='Based on the current temperature of 88°F (31°C) in Pittsburgh, PA, it is indeed quite hot right now. This temperature is generally considered warm to hot, especially if accompanied by high humidity, which is common in Pittsburgh during summer months.', role='assistant', function_call=None, tool_calls=None
Actual output: UnprocessableEntityError: Failed to deserialize the JSON body into the target type: messages[2].content: data did not match any variant of untagged enum MessageContent at line 1 column 675
Hello! I am facing the same issue here. Was anyone able to find a workaround for this by any chance?
I guess this is related to #2480.
As meta describes, for passing back the ToolCall Message we need to use their new Role ipython:
https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_1
Maybe that is causing the issue.
I am encountering the same issue when using the react agent. The agent executes the language model (LLM) twice, passing the tool descriptions in each. In a basic setup, the expected behavior is that the agent receives a function call in the first response and the final answer in the second response after the tool's answer is appended to the message list.
However, when the TGI detects a function call descriptor in the request, it enforces grammar interpretation and expects to handle the function call. This behavior causes a problem when a function call is not actually needed or intended, leading to one of the following issues:
When I try to force TGI to bypass the function call and return a normal response (when no tool call is needed), it returns a notify_error.
Below is an example of the call and the problematic response behavior:
Request
{
"model": "llama3",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant.",
"name": null
},
{
"role": "user",
"content": "For the following plan:\n1. Find the winner of the 2016 Australia Open\n2. Find the hometown of the winner\n\nYou are tasked with executing step 1, Find the winner of the 2016 Australia Open",
"name": null
},
{
"role": "assistant",
"content": "",
"name": null,
"tool_calls": [
{
"id": 0,
"type": "function",
"function": {
"name": "web_search",
"arguments": "{\"query\": \"2016 Australia Open winner\"}"
}
}
]
},
{
"role": "tool",
"content": "The 2016 Australian Open was a tennis tournament that took place at Melbourne Park between 18 and 31 January 2016.[1] It was the 104th edition of the Australian Open, and the first Grand Slam tournament of the year. The tournament consisted of events for professional players in singles, doubles and mixed doubles play. Junior and wheelchair players competed in singles and doubles tournaments. Novak Djokovic successfully defended the men'\''s singles title and thus won a record-equaling sixth Australian Open title. Serena Williams was the defending champion in the women'\''s singles but failed to defend her title, losing to Angelique Kerber in the final; by winning, Kerber became the first German player of any gender to win a Grand Slam title since Steffi Graf won her last such title at the 1999 French Open.[2]",
"tool_call_id": "0"
}
],
"temperature": 0.1,
"stop": [
"<|start_header_id|>",
"<|end_header_id|>",
"<|reserved_special_token|>",
"<|eot_id|>"
],
"tools": [
{
"type": "function",
"function": {
"name": "web_search",
"description": "A search engine optimized for comprehensive, accurate, and trusted results.\nUseful for when you need to answer questions about current events.\nInput should be a search query.",
"parameters": {
"type": "object",
"properties": {
"query": {
"description": "search query to look up",
"type": "string"
}
},
"required": [
"query"
]
}
}
}
],
"tool_choice": "auto",
"tool_prompt": "Please respond directly to the question unless using a function call provides significant clarity or concision. In cases where a function call is necessary, provide a JSON object specifying the function name and its required arguments, formatted as {name: '\''function_name'\'', parameters: {'\''argument1'\'': '\''value1'\'',...}}. Avoid unnecessary function calls and variable assignments"
}
TGI Response:
{
"object": "chat.completion",
"id": "",
"created": 1727596257,
"model": "/models/models--meta-llama--Meta-Llama-3.1-70B-instruct/",
"system_fingerprint": "2.3.1-dev0-sha-169178b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"tool_calls": [
{
"id": "0",
"type": "function",
"function": {
"description": null,
"name": "notify_error",
"arguments": {
"error": "The winner of the 2016 Australia Open is Novak Djokovic for the men"
}
}
}
]
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 443,
"completion_tokens": 39,
"total_tokens": 482
}
}
System Info
I am testing using the TGI Tool Call. But The error continues to occur, can you check it?
Information
Tasks
Reproduction
Expected behavior
UnprocessableEntityError: Failed to deserialize the JSON body into the target type: messages[2].content: data did not match any variant of untagged enum MessageContent at line 1 column 675