hyperonym / basaran

Basaran is an open-source alternative to the OpenAI text completion API. It provides a compatible streaming API for your Hugging Face Transformers-based text generation models.
MIT License
1.29k stars 81 forks source link

401 error on llama2 model while access granted #289

Open tomtomtomtom44 opened 6 months ago

tomtomtomtom44 commented 6 months ago

Hello,

Trying to get llama-2-7b-chat-hf working but getting this error :

docker run -p 80:80 -e MODEL=meta-llama/Llama-2-7b-chat-hf hyperonym/basaran:0.21.1 Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/huggingface_hub/utils/_errors.py", line 261, in hf_raise_for_status response.raise_for_status() File "/usr/local/lib/python3.8/dist-packages/requests/models.py", line 1021, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/transformers/utils/hub.py", line 428, in cached_file resolved_file = hf_hub_download( File "/usr/local/lib/python3.8/dist-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn return fn(*args, *kwargs) File "/usr/local/lib/python3.8/dist-packages/huggingface_hub/file_download.py", line 1195, in hf_hub_download metadata = get_hf_file_metadata( File "/usr/local/lib/python3.8/dist-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn return fn(args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/huggingface_hub/file_download.py", line 1541, in get_hf_file_metadata hf_raise_for_status(r) File "/usr/local/lib/python3.8/dist-packages/huggingface_hub/utils/_errors.py", line 277, in hf_raise_for_status raise GatedRepoError(message, response) from e huggingface_hub.utils._errors.GatedRepoError: 401 Client Error. (Request ID: Root=1-657dfa24-0ebb40e66e153b8e7ed92d16;6e8f330d-5d1f-42ab-ba69-1ec0a453f218)

Cannot access gated repo for url https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json. Repo model meta-llama/Llama-2-7b-chat-hf is gated. You must be authenticated to access it.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/app/basaran/main.py", line 41, in stream_model = load_model( File "/app/basaran/model.py", line 332, in load_model tokenizer = AutoTokenizer.from_pretrained(name_or_path, kwargs) File "/usr/local/lib/python3.8/dist-packages/transformers/models/auto/tokenization_auto.py", line 677, in from_pretrained tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, kwargs) File "/usr/local/lib/python3.8/dist-packages/transformers/models/auto/tokenization_auto.py", line 677, in from_pretrained tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs) File "/usr/local/lib/python3.8/dist-packages/transformers/models/auto/tokenization_auto.py", line 510, in get_tokenizer_config resolved_config_file = cached_file( File "/usr/local/lib/python3.8/dist-packages/transformers/utils/hub.py", line 443, in cached_file raise EnvironmentError( OSError: You are trying to access a gated repo. Make sure to request access at https://huggingface.co/meta-llama/Llama-2-7b-chat-hf and pass a token having permission to this repo either by logging in with huggingface-cli login or by passing token=<your_token>.

However, i am authenticated to huggingface hub

huggingface-cli.exe whoami tomtomtom44

And my request access to the model repo has been granted : "Gated model You have been granted access to this model" on https://huggingface.co/meta-llama/Llama-2-7b-chat-hf