huggingface / huggingface-llama-recipes

531 stars 59 forks source link

Use `huggingface_hub` InferenceClient for Inference API example #57

Closed hanouticelina closed 1 month ago

hanouticelina commented 1 month ago

Hello 👋 This small PR replaces OpenAI client with huggingface_hub.InferenceClient in the API inference example.

Main changes :

cc @ariG23498 @osanseviero and @Wauplin

hanouticelina commented 1 month ago

Thanks for updating this! I left a comment re: authentication since it's not need anymore (InferenceClient does it automatically).

Also, would it be useful to link to https://huggingface.co/docs/huggingface_hub/guides/inference#openai-compatibility or not really? (depends on who are the expected users I think)

yes, maybe it's better to add a link to the Inference documentation(https://huggingface.co/docs/huggingface_hub/guides/inference) and not specifically to the openai compatibility section

Wauplin commented 1 month ago

Conceptually approved the PR but let's wait for others to review as I never contributed to this notebook. Pinging @pcuenca who added the example in the first place https://github.com/huggingface/huggingface-llama-recipes/pull/5.

ariG23498 commented 1 month ago

Hi @hanouticelina

I was not able to run the notebook. Here is the reproduction.

Am I missing something?

Wauplin commented 1 month ago

I think meta-llama/Llama-3.1-405B-Instruct-FP8 has been removed from the Inference API because too heavy / costly to run + too slow for the users. Let's use meta-llama/Llama-3.1-70B-Instruct for the example.

About the timeout issue, can you provide more info if it happens again?

hanouticelina commented 1 month ago

I ran the notebook with meta-llama/Llama-3.1-70B-Instruct since it was suggesting that meta-llama/Llama-3.1-405B-Instruct-FP8 is accessible for free only for PRO users. Since it's no longer available, I will replace it with meta-llama/Llama-3.1-70B-Instruct.

I just tried meta-llama/Llama-3.2-1B-Instruct and meta-llama/Llama-3.2-3B-Instruct, I also got 504 time out:

HfHubHTTPError: 504 Server Error: Gateway Timeout for url: https://api-inference.huggingface.co/models/meta-llama/Llama-3.2-1B-Instruct/v1/chat/completions (Request ID: i5Ty822Xhz3eOD_emhVSW)

Inference API is down for both Llama-3.2 models, I posted a message in an internal channel.

ariG23498 commented 1 month ago

Thanks for the contribution! 🤗