huggingface / text-generation-inference

Large Language Model Text Generation Inference
http://hf.co/docs/text-generation-inference
Apache License 2.0
8.93k stars 1.05k forks source link

tgi server :: tool_choice="auto" behaves like tool_choice="required" from OpenAI spec #2549

Open mottoslo opened 4 weeks ago

mottoslo commented 4 weeks ago

System Info

tgi version : 2.3.0 model : Meta-Llama-3-8B-Instruct

Information

Tasks

Reproduction

0. tool definition to use for reproduction

weather_tool = {
    "type": "function",
    "function": {
        "name": "get_current_weather",
        "description": "Get the current weather in a specified city with specified measure",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state, always seoul"
                },
                "format": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "The temperature unit to use."
                }
            },
            "required": ["location", "format"]
        }
    }
}

1. Using OpenAI with tool_choice="auto"

api_key="[OPENAI_API_KEY]"

client = OpenAI(
    api_key=api_key
)

messages = [
    {
        "role": "system",
        "content": "You're a helpful assistant! Use tools if necessary",
    },
    {
        "role": "user",
        "content": "just respond with a warm greeting"
    }
]

chat_response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=[weather_tool],
    tool_choice="auto",
    stream=False
)
print(chat_response)
ChatCompletionMessage(content='Hello there! 🌞 How are you today?', refusal=None, role='assistant', function_call=None, tool_calls=None)

=> responds with normal chat message since prompt does not need tool_call

2. Using tgi with tool_choice="auto" (model = llama)

client = OpenAI(
    base_url="http://127.0.0.1:8080/v1/",
    api_key="dummy_key"
)

messages = [
    {
        "role": "system",
        "content": "You're a helpful assistant! Use tools if necessary",
    },
    {
        "role": "user",
        "content": "just respond with a warm greeting"
    }
]

chat_response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=[weather_tool],
    tool_choice="auto",
    stream=False
)
print(chat_response.choices[0].message)
ChatCompletionMessage(content=None, refusal=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='0', function=Function(arguments={'format': 'celsius', 'location': 'Seoul'}, name='get_current_weather', description=None), type='function')])

=> tries to call a function anyway

3. Using OpenAI with tool_choice="required"

api_key="[OPENAI_API_KEY]"
client = OpenAI(
    api_key=api_key
)

messages = [
    {
        "role": "system",
        "content": "You're a helpful assistant! Use tools if necessary",
    },
    {
        "role": "user",
        "content": "just respond with a warm greeting"
    }
]

chat_response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=[weather_tool],
    tool_choice="required",
    stream=False
)
print(chat_response.choices[0].message)
ChatCompletionMessage(content=None, refusal=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_0ZyaXEb9hIIQbJybYNlPjRVe', function=Function(arguments='{"location": "seoul", "format": "celsius"}', name='get_current_weather'), type='function')])

=> tries to call a function anyway

Expected behavior

When consuming tgi, I expect the server to be able to respond both with and without tool_call, when provided with tool definitions. As of now, application needs to be aware that tool calling is required before calling tgi, which In my opinion is not something LLM applications should aim for.

I am curious if the above behavior is intended. I have found that someone has raised this issue, (https://github.com/huggingface/text-generation-inference/pull/1587#issuecomment-1979185339) but it wasn't addressed anywhere.

Maybe something can be done with tool prompt, ToolType enum and chat_completions logic in server.rs ? If this behavior is not intended and needs fixing, I would love to give it a shot ! Thank you :)

mottoslo commented 5 hours ago

gentle ping @drbh is this issue being handled internally ? any feedback would be great !