mottoslo commented 2 months ago

System Info

tgi version : 2.3.0 model : Meta-Llama-3-8B-Instruct

Information

[X] Docker
[ ] The CLI directly

Tasks

[X] An officially supported command
[ ] My own modifications

Reproduction

0. tool definition to use for reproduction

weather_tool = {
    "type": "function",
    "function": {
        "name": "get_current_weather",
        "description": "Get the current weather in a specified city with specified measure",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state, always seoul"
                },
                "format": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "The temperature unit to use."
                }
            },
            "required": ["location", "format"]
        }
    }
}

1. Using OpenAI with tool_choice="auto"

api_key="[OPENAI_API_KEY]"

client = OpenAI(
    api_key=api_key
)

messages = [
    {
        "role": "system",
        "content": "You're a helpful assistant! Use tools if necessary",
    },
    {
        "role": "user",
        "content": "just respond with a warm greeting"
    }
]

chat_response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=[weather_tool],
    tool_choice="auto",
    stream=False
)
print(chat_response)

ChatCompletionMessage(content='Hello there! 🌞 How are you today?', refusal=None, role='assistant', function_call=None, tool_calls=None)

=> responds with normal chat message since prompt does not need tool_call

2. Using tgi with tool_choice="auto" (model = llama)

client = OpenAI(
    base_url="http://127.0.0.1:8080/v1/",
    api_key="dummy_key"
)

messages = [
    {
        "role": "system",
        "content": "You're a helpful assistant! Use tools if necessary",
    },
    {
        "role": "user",
        "content": "just respond with a warm greeting"
    }
]

chat_response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=[weather_tool],
    tool_choice="auto",
    stream=False
)
print(chat_response.choices[0].message)

ChatCompletionMessage(content=None, refusal=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='0', function=Function(arguments={'format': 'celsius', 'location': 'Seoul'}, name='get_current_weather', description=None), type='function')])

=> tries to call a function anyway

3. Using OpenAI with tool_choice="required"

api_key="[OPENAI_API_KEY]"
client = OpenAI(
    api_key=api_key
)

messages = [
    {
        "role": "system",
        "content": "You're a helpful assistant! Use tools if necessary",
    },
    {
        "role": "user",
        "content": "just respond with a warm greeting"
    }
]

chat_response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=[weather_tool],
    tool_choice="required",
    stream=False
)
print(chat_response.choices[0].message)

ChatCompletionMessage(content=None, refusal=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_0ZyaXEb9hIIQbJybYNlPjRVe', function=Function(arguments='{"location": "seoul", "format": "celsius"}', name='get_current_weather'), type='function')])

=> tries to call a function anyway

Expected behavior

When consuming tgi, I expect the server to be able to respond both with and without tool_call, when provided with tool definitions. As of now, application needs to be aware that tool calling is required before calling tgi, which In my opinion is not something LLM applications should aim for.

I am curious if the above behavior is intended. I have found that someone has raised this issue, (https://github.com/huggingface/text-generation-inference/pull/1587#issuecomment-1979185339) but it wasn't addressed anywhere.

Maybe something can be done with tool prompt, ToolType enum and chat_completions logic in server.rs ? If this behavior is not intended and needs fixing, I would love to give it a shot ! Thank you :)

mottoslo commented 1 month ago

gentle ping @drbh is this issue being handled internally ? any feedback would be great !

Simon-Stone commented 1 month ago

I am running into this issue as well. I am not knowledgable enough in rust to deal with this, but I would very much appreciate if you take this on @mottoslo !

mottoslo commented 1 month ago

I am running into this issue as well. I am not knowledgable enough in rust to deal with this, but I would very much appreciate if you take this on @mottoslo !

I think handling this issue may involve (breaking) changes in feature and needs to be discussed beforehand, hence I do not know where to start. However, some pull requests have been opened since ( https://github.com/huggingface/text-generation-inference/pull/2645 https://github.com/huggingface/text-generation-inference/pull/2614 ... ) that I think are related to this, so I assume there's an internal consensus on how things should be done ?

Simon-Stone commented 4 weeks ago

Either way, it would be a huge improvement. As it stands, we can't easily build agents based on models deployed with TGI because of this. At least not using the Messages API. I tried manually applying the chat template and using the generate endpoints, and the model appears to be able to choose not to use a tool. The downside of this approach is that the manual chat template handling makes it much harder to integrate in existing frameworks. Being able to use TGI as a drop-in replacement for OpenAI models would be fantastic.

Simon-Stone commented 3 weeks ago

2614 has been merged into main and is part of the latest release. Has anyone already had a chance to test if this solves the issue?

huggingface / text-generation-inference

tgi server :: tool_choice="auto" behaves like tool_choice="required" from OpenAI spec #2549