huggingface / text-generation-inference

Large Language Model Text Generation Inference
http://hf.co/docs/text-generation-inference
Apache License 2.0
9.09k stars 1.07k forks source link

Tools Not Passed in Prompt Leading to Incorrect Function Calls in TGI #2375

Closed srossi93 closed 3 months ago

srossi93 commented 3 months ago

System Info

2024-08-08T07:56:58.379296Z  INFO text_generation_launcher: Runtime environment:
Target: x86_64-unknown-linux-gnu
Cargo version: 1.79.0
Commit sha: db7e043ded45e14ed24188d5a963911c96049618
Docker label: sha-db7e043
nvidia-smi:
Thu Aug  8 07:56:58 2024       
   +---------------------------------------------------------------------------------------+
   | NVIDIA-SMI 535.104.12             Driver Version: 535.104.12   CUDA Version: 12.2     |
   |-----------------------------------------+----------------------+----------------------+
   | GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
   | Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
   |                                         |                      |               MIG M. |
   |=========================================+======================+======================|
   |   0  NVIDIA A100-SXM4-40GB          On  | 00000000:A0:1D.0 Off |                    0 |
   | N/A   30C    P0              72W / 400W |  38379MiB / 40960MiB |      0%      Default |
   |                                         |                      |             Disabled |
   +-----------------------------------------+----------------------+----------------------+

   +---------------------------------------------------------------------------------------+
   | Processes:                                                                            |
   |  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
   |        ID   ID                                                             Usage      |
   |=======================================================================================|
   |    0   N/A  N/A   1916177      C   /opt/conda/bin/python3.10                 38370MiB |
   +---------------------------------------------------------------------------------------+
xpu-smi:
N/A

Information

Tasks

Reproduction

I'm facing a problem regarding with tools and TGI. Unless I'm mistaken, the tools are never passed in the input to the model, degrading completely the accuracy and the quality of the responses. The tools are used to set the grammar but this is not enough, as they have to be put in the prompt as well with a particular format (see for example, the chat template of mistral-7b here)

Replicate the problem

Start a server with Mistral-7b-v3 (with debug log)

export LOG_LEVEL="debug"
text-generation-launcher --model-id mistralai/Mistral-7B-Instruct-v0.3 --num-shard 1 --port 42253 --hostname 0.0.0.0 --dtype float16 

And try to run

import json
from huggingface_hub import InferenceClient

client = InferenceClient("http://localhost:42253")

tools = [
    {
        "type": "function",
        "function": {
            "name": "retrieve_payment_status",
            "description": "Get payment status of a transaction",
            "parameters": {
                "type": "object",
                "properties": {
                    "transaction_id": {
                        "type": "string",
                        "description": "The transaction id.",
                    }
                },
                "required": ["transaction_id"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "retrieve_payment_date",
            "description": "Get payment date of a transaction",
            "parameters": {
                "type": "object",
                "properties": {
                    "transaction_id": {
                        "type": "string",
                        "description": "The transaction id.",
                    }
                },
                "required": ["transaction_id"],
            },
        },
    }
]

chat = client.chat_completion(
    messages=[
        {
            "role": "user",
            "content": "What's the status of my transaction T1001?",
        },
    ],
    tools=tools,
    seed=42,
    max_tokens=100,
)

print(chat.choices[0].message.tool_calls)

The output will be similar to:

> [ChatCompletionOutputToolCall(function=ChatCompletionOutputFunctionDefinition(arguments={'transaction_id': 'T1001'}, name='getTransactionStatus', description=None), id='0', type='function')]

which has wrong function name. This is an example from Mistral documentation, so I tend to not blame the model for this (I also doubled checked without TGI and everything works as expected).

Looking at the server log, you can check the formatted input

2024-08-08T09:27:34.914443Z DEBUG chat_completions: text_generation_router::server: router/src/server.rs:192: Input: <s>[INST] What's the status of my transaction T1001?[/INST]

It looks like, tools are not passed in the prompt, which is a problem.

Workaround

It's possible to add a system prompt with the tools mimicking the chat template

{
    "role": "system",
    "content": f"[AVAILABLE_TOOLS] {json.dumps(tools)} [/AVAILABLE_TOOLS]"
               "You're a helpful assistant! Use tools if necessary, and reply in a JSON format",
}

which formats to

2024-08-08T10:12:41.322683Z DEBUG chat_completions: text_generation_router::server: router/src/server.rs:192: Input: <s>[INST] [AVAILABLE_TOOLS] [{"type": "function", "function": {"name": "retrieve_payment_status", "description": "Get payment status of a transaction", "parameters": {"type": "object", "properties": {"transaction_id": {"type": "string", "description": "The transaction id."}}, "required": ["transaction_id"]}}}, {"type": "function", "function": {"name": "retrieve_payment_date", "description": "Get payment date of a transaction", "parameters": {"type": "object", "properties": {"transaction_id": {"type": "string", "description": "The transaction id."}}, "required": ["transaction_id"]}}}] [/AVAILABLE_TOOLS]You're a helpful assistant! Use tools if necessary, and reply in a JSON format

What's the status of my transaction T1001?[/INST]

This is not following exactly Mistral's chat template but it's an improvement (and now the tool call is correct).

But this should not be the default behavior, I'm expecting that TGI respects the chat templates, including the tools format.

I also checked if changing the tools_prompt could be a solution, but also this argument has no effect of the final prompt (and on the final output).

The alternatives are:

  1. Is there a problem with the Mistral tokenizer? IMO, unlikely because it works fine without TGI
  2. Is there a problem with the TGI chat template? Unless I'm mistaken, this looks like it. Tools are never passed in when formatting the final prompt.

I've seen a couple of issues that might be related to this (e.g. #2310, #2240)

BTW, some of the examples in your documentation are also broken: the format is correct but the name/arguments of the tool calls are not.

Expected behavior

See above

ErikKaum commented 3 months ago

Hi @srossi93 👋

Thanks for bringing this up. I think this a valid concern, and the even though the workaround is OK, implementing the chat template might be the real solution here.

I'll loop in @drbh to this conversation as well.

In the meantime, could you point which examples in the docs are incomplete? Even better if you have time to make a PR with corrections 🙌

srossi93 commented 3 months ago

Hi @ErikKaum,

Thanks for the reply. I would wait to check this before updating the examples (I might be able to do it). BTW, can you point me where in the server (or in the router) the chat template is used?

ErikKaum commented 3 months ago

Sounds good 👍

srossi93 commented 3 months ago

@ErikKaum @drbh

Ok, I think I actually found the problem. The chunk message with the tool available was never actually appended to the last conversation turn. The change is minimal, I opened a PR to fix this (#2395).

With the fix, this is the new prompt:

2024-08-10T15:50:13.473242Z DEBUG chat_completions: text_generation_router::server: router/src/server.rs:296: Input: <s>[INST] What's the status of my transaction T1001?
---

{"$functions":{"notify_error":{"properties":{"_name":{"const":"notify_error","type":"string"},"error":{"description":"The error or issue to notify","type":"string"}},"required":["error","_name"],"type":"object"},"retrieve_payment_date":{"description":"Get payment date of a transaction","properties":{"_name":{"const":"retrieve_payment_date","type":"string"},"transaction_id":{"description":"The transaction id.","type":"string"}},"required":["transaction_id","_name"],"type":"object"},"retrieve_payment_status":{"description":"Get payment status of a transaction","properties":{"_name":{"const":"retrieve_payment_status","type":"string"},"transaction_id":{"description":"The transaction id.","type":"string"}},"required":["transaction_id","_name"],"type":"object"}},"properties":{"function":{"anyOf":[{"$ref":"#/$functions/retrieve_payment_status"},{"$ref":"#/$functions/retrieve_payment_date"},{"$ref":"#/$functions/notify_error"}]}}}[/INST]

Now, I still think this is a temporary solution, in the sense that it still doesn't follow the prompt template with tools. But at least, the functionality is implemented correctly now.

Please, see if someone can review and merge the PR.

ErikKaum commented 3 months ago

Thanks a lot for finding the problem & opening a PR 🙌

I think we'll add some test + make sure CI is green, other than that I think this should be good to go 👍

srossi93 commented 3 months ago

Closing, as fix is merged.