huggingface / huggingface_hub

The official Python client for the Huggingface Hub.
https://huggingface.co/docs/huggingface_hub
Apache License 2.0
2.02k stars 531 forks source link

Fix `InferenceClient` for HF Nvidia NIM API #2482

Closed Wauplin closed 1 month ago

Wauplin commented 1 month ago

Fix https://github.com/huggingface/huggingface_hub/issues/2480.

Two things in this PR (for chat_completion):

With this, it is now possible to use the HF Nvidia NIM API using InferenceClient:

from huggingface_hub import InferenceClient

# instead of `client = OpenAI(...)`
client = InferenceClient(
    base_url="https://huggingface.co/api/integrations/dgx/v1",
    api_key="MY_FINEGRAINED_TOKEN"
)

output = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3-8B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Count to 10"},
    ],
    stream=True,
    max_tokens=1024,
)

for chunk in output:
    print(chunk.choices[0].delta.content, end="")
Here it goes:

1, 2, 3, 4, 5, 6, 7, 8, 9, 10!
HuggingFaceDocBuilderDev commented 1 month ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Wauplin commented 1 month ago

Let's get this merge! Thanks for the reviews :)