Closed hanouticelina closed 1 month ago
Thanks for updating this! I left a comment re: authentication since it's not need anymore (
InferenceClient
does it automatically).Also, would it be useful to link to https://huggingface.co/docs/huggingface_hub/guides/inference#openai-compatibility or not really? (depends on who are the expected users I think)
yes, maybe it's better to add a link to the Inference documentation(https://huggingface.co/docs/huggingface_hub/guides/inference) and not specifically to the openai compatibility section
Conceptually approved the PR but let's wait for others to review as I never contributed to this notebook. Pinging @pcuenca who added the example in the first place https://github.com/huggingface/huggingface-llama-recipes/pull/5.
Hi @hanouticelina
I was not able to run the notebook. Here is the reproduction.
meta-llama/Llama-3.1-405B-Instruct-FP8
in the list (maybe this is because I am not using a Pro account?)meta-llama/Llama-3.2-1B-Instruct
instead and run the notebook, it times out.login
api, and the token has "write" permissions as well.Am I missing something?
I think meta-llama/Llama-3.1-405B-Instruct-FP8
has been removed from the Inference API because too heavy / costly to run + too slow for the users. Let's use meta-llama/Llama-3.1-70B-Instruct
for the example.
About the timeout issue, can you provide more info if it happens again?
I ran the notebook with meta-llama/Llama-3.1-70B-Instruct
since it was suggesting that meta-llama/Llama-3.1-405B-Instruct-FP8
is accessible for free only for PRO users. Since it's no longer available, I will replace it with meta-llama/Llama-3.1-70B-Instruct
.
I just tried meta-llama/Llama-3.2-1B-Instruct
and meta-llama/Llama-3.2-3B-Instruct
, I also got 504 time out:
HfHubHTTPError: 504 Server Error: Gateway Timeout for url: https://api-inference.huggingface.co/models/meta-llama/Llama-3.2-1B-Instruct/v1/chat/completions (Request ID: i5Ty822Xhz3eOD_emhVSW)
Inference API is down for both Llama-3.2 models, I posted a message in an internal channel.
Thanks for the contribution! 🤗
Hello 👋 This small PR replaces OpenAI client with
huggingface_hub.InferenceClient
in the API inference example.Main changes :
from openai import OpenAI
byfrom huggingface_hub import InferenceClient
andclient = OpenAI(...)
byclient = InferenceClient(...)
.huggingface_hub
."meta-llama/Meta-Llama-3.1-405B-Instruct-FP8"
->"meta-llama/Llama-3.1-405B-Instruct-FP8"
cc @ariG23498 @osanseviero and @Wauplin