Incompatibility with Empty Enum List in Tool Parameters

RamonKaspar commented 1 month ago

Describe the bug

The huggingface_hub inference client does not support an empty enum list in tool parameters, while the OpenAI client does. This causes compatibility issues when switching between the two clients.

Reproduction

When running the following code using the OpenAI client, everything works fine (the client handles the empty enum list). However, using the HF InferenceClient results in an exception.

from huggingface_hub import InferenceClient
from openai import OpenAI
import os

from dotenv import load_dotenv
load_dotenv()

client = InferenceClient(
    api_key=os.getenv("HF_API_KEY")
)
# client = OpenAI(
#     api_key=os.getenv("OPENAI_API_KEY")
# )

messages = [
    {
        "role": "system",
        "content": "Don't make assumptions about what values to plug into functions. Ask for clarification if a user request is ambiguous.",
    },
    {
        "role": "user",
        "content": "What's the weather like in San Francisco, CA (in Fahrenheit)?",
    },
]
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA",
                    },
                    "format": {
                        "type": "string",
                        "enum": [],
                        "description": "The temperature unit to use. Infer this from the users location.",
                    },
                },
                "required": ["location", "format"],
            },
        },
    },
]

response = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3-8B-Instruct",
    # model="gpt-4o-mini",
    messages=messages,
    tools=tools,
    tool_choice="auto",
    max_tokens=500,
)
print(response.choices[0].message.tool_calls[0].function)

Logs

Traceback (most recent call last):
  File "C:\Users\ramon\anaconda3\envs\ml-env\lib\site-packages\huggingface_hub\utils\_errors.py", line 304, in hf_raise_for_status
    response.raise_for_status()
  File "C:\Users\ramon\anaconda3\envs\ml-env\lib\site-packages\requests\models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 422 Client Error: Unprocessable Entity for url: https://api-inference.huggingface.co/models/meta-llama/Meta-Llama-3-8B-Instruct/v1/chat/completions

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\path\to\main.py", line 50, in <module>
    response = client.chat.completions.create(
  File "C:\Users\ramon\anaconda3\envs\ml-env\lib\site-packages\huggingface_hub\inference\_client.py", line 838, in chat_completion
    data = self.post(
  File "C:\Users\ramon\anaconda3\envs\ml-env\lib\site-packages\huggingface_hub\inference\_client.py", line 308, in post
    hf_raise_for_status(response)
  File "C:\Users\ramon\anaconda3\envs\ml-env\lib\site-packages\huggingface_hub\utils\_errors.py", line 371, in hf_raise_for_status
    raise HfHubHTTPError(str(e), response=response) from e
huggingface_hub.utils._errors.HfHubHTTPError: 422 Client Error: Unprocessable Entity for url: https://api-inference.huggingface.co/models/meta-llama/Meta-Llama-3-8B-Instruct/v1/chat/completions (Request ID: x71rGgioKHjnIyXp--Zfs)

expected value at line 1 column 53

System info

- huggingface_hub version: 0.24.0
- Platform: Windows-10-10.0.22631-SP0
- Python version: 3.9.18
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Token path ?: C:\Users\ramon\.cache\huggingface\token
- Has saved token ?: True
- Configured git credential helpers: manager
- FastAI: N/A
- Tensorflow: N/A
- Torch: 2.2.2
- Jinja2: 3.1.3
- Graphviz: N/A
- keras: N/A
- Pydot: N/A
- Pillow: 10.2.0
- hf_transfer: N/A
- gradio: N/A
- tensorboard: N/A
- numpy: 1.26.4
- pydantic: 2.7.1
- aiohttp: 3.9.3
- ENDPOINT: https://huggingface.co
- HF_HUB_CACHE: C:\Users\ramon\.cache\huggingface\hub
- HF_ASSETS_CACHE: C:\Users\ramon\.cache\huggingface\assets
- HF_TOKEN_PATH: C:\Users\ramon\.cache\huggingface\token
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False
- HF_HUB_ETAG_TIMEOUT: 10
- HF_HUB_DOWNLOAD_TIMEOUT: 10

Wauplin commented 1 month ago

Hi @RamonKaspar, thanks for reporting this issue. This doesn't seem to be an issue with the InferenceClient but rather the server not handling this. cc @drbh @OlivierDehaene could you have a look at this issue to know why the payload is considered as an unprocessable entity by TGI? Maybe returning why the payload is an unprocessable entity would help (similar to what pydantic does).

Btw, I've tried to run the same code with:

client = OpenAI(
    base_url="https://api-inference.huggingface.co/models/meta-llama/Meta-Llama-3-8B-Instruct/v1",
    api_key=os.getenv("HF_TOKEN")
)

and the issue is the same. This proves that the problem is not client side (since both clients result in the same error) but server-side.

Wauplin commented 1 month ago

@RamonKaspar what is the output from openai client using an openai endpoint (model="gpt-4o-mini",?) when sending a tool with empty enum? I've investigated the error with @drbh and it doesn't seem be an error to us. The user provides as input a tool that is impossible to satisfy given the enum is empty. The server raises an exception since it cannot provide any input.

However, we do agree the error message expected value at line 1 column 53 is not explicit enough and should ideally be improved further. This is not something to fix in huggingface_hub itself but client-side in TGI.

RamonKaspar commented 1 month ago

Hi @Wauplin , Using the OpenAI client with the gpt-4o-mini model and an empty enum for the format parameter successfully executes the function, filling in "Fahrenheit" as the format: Function(arguments='{"location":"San Francisco, CA","format":"Fahrenheit"}', name='get_current_weather')

I understand that an empty enum technically represents a constraint that isn’t satisfiable. However, my initial report was motivated by the inconsistency between the Huggingface and OpenAI clients, especially since the HF documentation suggests that they should function interchangeably...

Wauplin commented 1 month ago

Thanks for the example @RamonKaspar. For what I understand, this is not an inconsistency between OpenAI and Huggingface clients but an inconsistency between OpenAI and TGI (text-generation-inference) servers on how to deal with empty tools.

What the documentation means about OpenAI/InferenceClient interchangeability is that for the same server, one can use both clients with the same result. If you use OpenAI client with https://api-inference.huggingface.co/models/meta-llama/Meta-Llama-3-8B-Instruct/v1 (i.e. HF server), you get the same issue.

Therefore, what I would suggest is to close this issue and re-open one in TGI. I'm not sure there's something to change server-side as this seems more a bug on OpenAI side that does not fulfill the constraint when not applicable (and still doesn't raise a warning/error).

huggingface / huggingface_hub