huggingface / huggingface_hub

The official Python client for the Huggingface Hub.
https://huggingface.co/docs/huggingface_hub
Apache License 2.0
2.01k stars 530 forks source link

Some InferenceClient tasks missing `parameters` argument, inconsistent with task specifications #2557

Open hanouticelina opened 1 week ago

hanouticelina commented 1 week ago

Some task methods in the huggingface_hub.InferenceClient do not include a parameters argument to allow passing additional inference params. The tasks are : audio-classification, automatic-speech-recognition, fill-mask, image-classification, image-segmentation, object-detection, question-answering, table-question-answering, text-classification, token-classification and translation. This inconsistency makes the implementation not fully aligned with the task specs here and the documentation here.

taking the example of text-classification:

with python requests:

import os

import requests

from huggingface_hub import InferenceClient

API_URL = "https://api-inference.huggingface.co/models/distilbert/distilbert-base-uncased-finetuned-sst-2-english"
headers = {"Authorization": f"Bearer {os.environ.get('HF_TOKEN')}"}

def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()

output = query(
    {
        "inputs": "I like you. I love you",
        "parameters": {"top_k": 1},
    }
)

print(output)
[{'label': 'POSITIVE', 'score': 0.9998738765716553}]

With InferenceClient

client = InferenceClient()

output = client.text_classification(
    model="distilbert-base-uncased-finetuned-sst-2-english",
    text="I like you. I love you",
    parameters={"top_k": 1},
)

print(output)
output = client.text_classification(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: InferenceClient.text_classification() got an unexpected keyword argument 'parameters'
Wauplin commented 1 week ago

Yes that is an identified problem. For context, when we first implemented the InferenceClient, we didn't have that much harmonization yet (specs didn't exists). To avoid future maintenance we decided not to add them in a lot of tasks -always better to do less and avoid future breaking changes-. That's why the tasks you've listed above don't have parameters.

That been said, if we start adding parameters, we most likely would add a parameters parameter to the method signatures but individual parameters instead (e.g. client.text_classification(..., top_k=1)).

EDIT: another part to take into consideration is that those tasks are much much less used than the text generation or image generation ones, hence the reduced scope.

hanouticelina commented 1 week ago

Agree, I prefer too adding a parameters argument instead of individual parameters, at least for those "less popular" tasks. Despite having low usage of those tasks though, I think it's still important to have the client aligned with the Inference API documentation.

Wauplin commented 1 week ago

Sorry sorry, my bad :see_no_evil:

I meant we most likely wouldn't add a parameters parameter to the method signatures but individual parameters instead.

The problem of a parameters parameter is that it doesn't play well with annotations/autocomplete. Even if we start using typing.TypedDict as OAI does, it's still not as natively supported as kwargs.

Wauplin commented 1 week ago

I think it's still important to have the client aligned with the Inference API documentation.

Makes sense now that we have a proper documentation :+1:

Wauplin commented 1 week ago

Since we are talking about ~10 tasks that follows exactly the same pattern (i.e. inputs + parameters), I think we should think of a semi-automated way to do that as well. This is how I would picture it (taking text_classification example):

With such a script, we would be sure all those "less used methods" are still well maintained according to the specs. In the past, I've created a few script to auto-generate things (see the ./utils folder), always based on static analysis i.e. only working on the module text. For such a script here, I think we'll need to work at the ast level.

It's usually good for such a script to have a "check" mode (fails if something has to be changed) and a "format" mode (modify the file inplace). Once we have it, we can integrate it to the makefile / CI. Let me know what you think :)

hanouticelina commented 1 week ago

Sorry sorry, my bad 🙈

I meant we most likely wouldn't add a parameters parameter to the method signatures but individual parameters instead.

The problem of a parameters parameter is that it doesn't play well with annotations/autocomplete. Even if we start using typing.TypedDict as OAI does, it's still not as natively supported as kwargs.

Yes, I was actually thinking about adding parameters to avoid modifying the client code if there are future changes in the parameters on the API side. But makes sense to add individual parameters, I don't think there will be much changes so let's do that.

Since we are talking about ~10 tasks that follows exactly the same pattern (i.e. inputs + parameters), I think we should think of a semi-automated way to do that as well. This is how I would picture it (taking text_classification example):

  • take TextClassificationParameters class that is auto-generated based on the specs
  • take InferenceClient.text_classification method that is for now entirely manually written
  • check if a parameter is not listed in text_classification method. If a parameter is missing,

    • add it to the method signature with the same type annotation as the dataclass (for instance function_to_apply: Optional["ClassificationOutputTransform"] = None)
    • add the attribute docstring (for instance """When specified, limits the output to the top K most probable classes.""" to the method docstring inside the args section
    • make sure the imports are correct (typically, from ... import ClassificationOutputTransform)

With such a script, we would be sure all those "less used methods" are still well maintained according to the specs. In the past, I've created a few script to auto-generate things (see the ./utils folder), always based on static analysis i.e. only working on the module text. For such a script here, I think we'll need to work at the ast level.

It's usually good for such a script to have a "check" mode (fails if something has to be changed) and a "format" mode (modify the file inplace). Once we have it, we can integrate it to the makefile / CI. Let me know what you think :)

Yes, I like the idea, that will be cool to have this in the makefile/CI. I think the script should:

I will open later a separate issue for this 👍

Wauplin commented 1 week ago

Not handle parameter renaming or deprecation. For example, if function_to_apply is renamed to fn_to_apply, we'd still need to manually keep the old name and deprecate it. The idea is really to make the script simple with not a lot of maintenance to do.

Agree :+1: And if it's too annoying in future maintenance, we can complete the script later.

I will open later a separate issue for this 👍

As you wish, it's also fine to start a PR directly or rename this issue.