Open SamuelBG13 opened 2 weeks ago
Have you tried setting HF_HUB_OFFLINE
environment variable?
Yes, I had tried that too. Then I get:
huggingface_hub.errors.OfflineModeIsEnabled: Cannot reach https://huggingface.co/api/models/{MODEL_REPO}: offline mode is enabled. To disable it, please unset the `HF_HUB_OFFLINE` environment variable.
Can you share the stack trace so we can see which LOC throws the error?
Hello! Apologies, I was thinking it was perhaps the expected behavior, hence I didn't file a bug issue properly.
Here is the traceback when I unset my HF_TOKEN
:
INFO api_server.py:177] Started engine process with PID (...)
Traceback (most recent call last):
File "/packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
response.raise_for_status()
File "/packages/requests/models.py", line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/mistralai/Pixtral-12B-2409/resolve/main/config.json
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/packages/(...)/bin/vllm", line 8, in <module>
sys.exit(main())
File "/packages/vllm/scripts.py", line 37, in serve
uvloop.run(run_server(args))
File "/packages/uvloop/__init__.py", line 82, in run
return loop.run_until_complete(wrapper())
File "/packages/uvloop/__init__.py", line 61, in wrapper
return await main
File "/packages/vllm/entrypoints/openai/api_server.py", line 538, in run_server
async with build_async_engine_client(args) as engine_client:
File "/packages/vllm/entrypoints/openai/api_server.py", line 105, in build_async_engine_client
async with build_async_engine_client_from_engine_args(
File "/packages/vllm/entrypoints/openai/api_server.py", line 182, in build_async_engine_client_from_engine_args
engine_config = engine_args.create_engine_config()
File "/packages/vllm/engine/arg_utils.py", line 874, in create_engine_config
model_config = self.create_model_config()
File "/packages/vllm/engine/arg_utils.py", line 811, in create_model_config
return ModelConfig(
File "/packages/vllm/config.py", line 183, in __init__
self.hf_config = get_config(self.model, trust_remote_code, revision,
File "/packages/vllm/transformers_utils/config.py", line 121, in get_config
if is_gguf or file_or_path_exists(model,
File "/packages/vllm/transformers_utils/config.py", line 96, in file_or_path_exists
return file_exists(model, config_name, revision=revision, token=token)
File "/packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/packages/huggingface_hub/hf_api.py", line 2641, in file_exists
get_hf_file_metadata(url, token=token)
File "/packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/packages/huggingface_hub/file_download.py", line 1645, in get_hf_file_metadata
r = _request_wrapper(
File "/packages/huggingface_hub/file_download.py", line 372, in _request_wrapper
response = _request_wrapper(
File "/packages/huggingface_hub/file_download.py", line 396, in _request_wrapper
hf_raise_for_status(response)
File "/packages/huggingface_hub/utils/_errors.py", line 321, in hf_raise_for_status
raise GatedRepoError(message, response) from e
huggingface_hub.utils._errors.GatedRepoError: 401 Client Error.
And as mentioned previously, mistralai/Pixtral-12B-2409
is already downloaded 😄
Using:
huggingface-hub==0.23.3
vllm==0.6.2
transformers==4.45.2
@SamuelBG13 Can you help to add the output of python collect_env.py
and env > env.txt
?
By adding export HF_HUB_OFFLINE=1
and manually offline internet, it works in my env.
Your current environment
I use vllm=0.6.2 installed via pip :-)
How would you like to use vllm
Hello!
First of all, thanks for your great service to the community! I appreciate the work you put on this package.
I am currently running models with the vLLM server. I am particularly interested in a gated model I have access to, so I followed the Huggingface Hub instructions for setting a token, downloaded the weights and ran the model successfully. I used:
vllm serve {model_name} --someotherargs --download-dir /some_local_directory
Until then all good. However, if I want to serve the model without a HF Hub connection (e.g. with no internet or on a fresh session with no HF_TOKEN) I cannot serve it, although the model is downloaded locally:
Of course, setting the HF_TOKEN again lets me serve the model (it does not download the weights again). But this is a bit of a bummer as I would like to use the server in local applications, regardless of the internet connection. Imagine you have an important event and the internet connection is bad or precisely that day the HF servers crash. Am I misunderstanding the usage, or is this a bug?
Things I tried: setting HF_HOME and HF_HUB_CACHE to the local directory with the model does not work either.