Open philschmid opened 8 months ago
cc @waleedkadous
@philschmid The README mentions HUGGINGFACE_API_KEY, but I couldn't get the your fork to benchmark Llama3 on an instance of text-generation-inference server without specifying HUGGINGFACE_API_TOKEN
. Is there a difference between HUGGINGFACE_API_TOKEN
and HUGGINGFACE_API_KEY
? Should all references be one or the other?
HUGGINFACE_API_TOKEN
HUGGINFACE_API_KEY
HF_TOKEN
which supersedes the deprecatedHUGGING_FACE_HUB_TOKEN
If HUGGINGFACE_API_TOKEN
is not set, you get this error when trying to benchmark meta-llama/Meta-Llama-3-70B-Instruct
. It can't pull the tokenizer without the token because Llama3 tokenizer is behind an agreement acknowledgment page:
OSError: You are trying to access a gated repo.
Make sure to have access to it at https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct.
401 Client Error. (Request ID: Root=1-668c4b2e-082a7cbe6986c4514589204c;528c624d-4cfa-42f0-bd0f-d3f2e1431fbf)
Cannot access gated repo for url https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct/resolve/main/config.json.
Access to model meta-llama/Meta-Llama-3-70B-Instruct is restricted. You must be authenticated to access it.
0%| | 0/2 [00:06<?, ?it/s]
What does this PR do?
This PR adds a dedicated Hugging Face client, which allows
llmperf
user to benchmark Hugging Face models using TGI on the API inference, Inference Endpoints or Locally/any URL.Below is an simple example
run tgi
run benchmark