tgi version : 2.3.0
model : Meta-Llama-3-8B-Instruct
Information
[X] Docker
[ ] The CLI directly
Tasks
[X] An officially supported command
[ ] My own modifications
Reproduction
0. tool definition to use for reproduction
weather_tool = {
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a specified city with specified measure",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, always seoul"
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use."
}
},
"required": ["location", "format"]
}
}
}
1. Using OpenAI with tool_choice="auto"
api_key="[OPENAI_API_KEY]"
client = OpenAI(
api_key=api_key
)
messages = [
{
"role": "system",
"content": "You're a helpful assistant! Use tools if necessary",
},
{
"role": "user",
"content": "just respond with a warm greeting"
}
]
chat_response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=[weather_tool],
tool_choice="auto",
stream=False
)
print(chat_response)
ChatCompletionMessage(content='Hello there! 🌞 How are you today?', refusal=None, role='assistant', function_call=None, tool_calls=None)
=> responds with normal chat message since prompt does not need tool_call
2. Using tgi with tool_choice="auto" (model = llama)
client = OpenAI(
base_url="http://127.0.0.1:8080/v1/",
api_key="dummy_key"
)
messages = [
{
"role": "system",
"content": "You're a helpful assistant! Use tools if necessary",
},
{
"role": "user",
"content": "just respond with a warm greeting"
}
]
chat_response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=[weather_tool],
tool_choice="auto",
stream=False
)
print(chat_response.choices[0].message)
When consuming tgi, I expect the server to be able to respond both with and without tool_call, when provided with tool definitions. As of now, application needs to be aware that tool calling is required before calling tgi, which In my opinion is not something LLM applications should aim for.
Maybe something can be done with tool prompt, ToolType enum and chat_completions logic in server.rs ?
If this behavior is not intended and needs fixing, I would love to give it a shot !
Thank you :)
System Info
tgi version : 2.3.0 model : Meta-Llama-3-8B-Instruct
Information
Tasks
Reproduction
0. tool definition to use for reproduction
1. Using OpenAI with tool_choice="auto"
=> responds with normal chat message since prompt does not need tool_call
2. Using tgi with tool_choice="auto" (model = llama)
=> tries to call a function anyway
3. Using OpenAI with tool_choice="required"
=> tries to call a function anyway
Expected behavior
When consuming tgi, I expect the server to be able to respond both with and without tool_call, when provided with tool definitions. As of now, application needs to be aware that tool calling is required before calling tgi, which In my opinion is not something LLM applications should aim for.
I am curious if the above behavior is intended. I have found that someone has raised this issue, (https://github.com/huggingface/text-generation-inference/pull/1587#issuecomment-1979185339) but it wasn't addressed anywhere.
Maybe something can be done with tool prompt, ToolType enum and chat_completions logic in server.rs ? If this behavior is not intended and needs fixing, I would love to give it a shot ! Thank you :)