Use `huggingface_hub` InferenceClient for Inference API example

huggingface / huggingface-llama-recipes

531 stars 59 forks source link

Use `huggingface_hub` InferenceClient for Inference API example #57

Closed hanouticelina closed 1 month ago

hanouticelina commented 1 month ago

Hello 👋 This small PR replaces OpenAI client with huggingface_hub.InferenceClient in the API inference example.

Main changes :

Replaced from openai import OpenAI by from huggingface_hub import InferenceClient and client = OpenAI(...) by client = InferenceClient(...).
Added authentication with huggingface_hub.
Fixed the model id : "meta-llama/Meta-Llama-3.1-405B-Instruct-FP8" -> "meta-llama/Llama-3.1-405B-Instruct-FP8"

cc @ariG23498 @osanseviero and @Wauplin

hanouticelina commented 1 month ago

Thanks for updating this! I left a comment re: authentication since it's not need anymore (InferenceClient does it automatically).

Also, would it be useful to link to https://huggingface.co/docs/huggingface_hub/guides/inference#openai-compatibility or not really? (depends on who are the expected users I think)

yes, maybe it's better to add a link to the Inference documentation(https://huggingface.co/docs/huggingface_hub/guides/inference) and not specifically to the openai compatibility section

Wauplin commented 1 month ago

Conceptually approved the PR but let's wait for others to review as I never contributed to this notebook. Pinging @pcuenca who added the example in the first place https://github.com/huggingface/huggingface-llama-recipes/pull/5.

ariG23498 commented 1 month ago

Hi @hanouticelina

I was not able to run the notebook. Here is the reproduction.

When I hit the api endpoint I get a list of all the models. I do not find meta-llama/Llama-3.1-405B-Instruct-FP8 in the list (maybe this is because I am not using a Pro account?)
When I use the meta-llama/Llama-3.2-1B-Instruct instead and run the notebook, it times out.
I have made sure I was logged in using the login api, and the token has "write" permissions as well.

Am I missing something?

Wauplin commented 1 month ago

I think meta-llama/Llama-3.1-405B-Instruct-FP8 has been removed from the Inference API because too heavy / costly to run + too slow for the users. Let's use meta-llama/Llama-3.1-70B-Instruct for the example.

About the timeout issue, can you provide more info if it happens again?

hanouticelina commented 1 month ago

I ran the notebook with meta-llama/Llama-3.1-70B-Instruct since it was suggesting that meta-llama/Llama-3.1-405B-Instruct-FP8 is accessible for free only for PRO users. Since it's no longer available, I will replace it with meta-llama/Llama-3.1-70B-Instruct.

I just tried meta-llama/Llama-3.2-1B-Instruct and meta-llama/Llama-3.2-3B-Instruct, I also got 504 time out:

HfHubHTTPError: 504 Server Error: Gateway Timeout for url: https://api-inference.huggingface.co/models/meta-llama/Llama-3.2-1B-Instruct/v1/chat/completions (Request ID: i5Ty822Xhz3eOD_emhVSW)

Inference API is down for both Llama-3.2 models, I posted a message in an internal channel.

ariG23498 commented 1 month ago

Thanks for the contribution! 🤗