Fix `InferenceClient` for HF Nvidia NIM API

Wauplin commented 1 month ago

Fix https://github.com/huggingface/huggingface_hub/issues/2480.

Two things in this PR (for chat_completion):

the model value was not set correctly in the payload. This is now fixed.
None values are not sent to the server anymore. Only values from the user are forwarded.

With this, it is now possible to use the HF Nvidia NIM API using InferenceClient:

from huggingface_hub import InferenceClient

# instead of `client = OpenAI(...)`
client = InferenceClient(
    base_url="https://huggingface.co/api/integrations/dgx/v1",
    api_key="MY_FINEGRAINED_TOKEN"
)

output = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3-8B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Count to 10"},
    ],
    stream=True,
    max_tokens=1024,
)

for chunk in output:
    print(chunk.choices[0].delta.content, end="")

Here it goes:

1, 2, 3, 4, 5, 6, 7, 8, 9, 10!

HuggingFaceDocBuilderDev commented 1 month ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Wauplin commented 1 month ago

Let's get this merge! Thanks for the reviews :)

huggingface / huggingface_hub

Fix `InferenceClient` for HF Nvidia NIM API #2482