vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
27.84k stars 4.11k forks source link

[Usage]: vllm can host offline? with internet connection? #3719

Open juud79 opened 6 months ago

juud79 commented 6 months ago

Your current environment

python3 python -m vllm.entrypoints.api_server --model TheBloke/CodeLlama-7B-Python-AWQ --quantization awq

How would you like to use vllm

I want to host offline envirment.

python3 python -m vllm.entrypoints.api_server --model TheBloke/CodeLlama-7B-Python-AWQ --quantization awq

but there is an error

juud79 commented 6 months ago

huggingface_hub.utils._http.OfflineModeIsEnabled: Cannot reach https://huggingface.co/api/models/bigcode/starcoder: offline mode is enabled. To disable it, please unset the HF_HUB_OFFLINE environment variable.

I already set HF_HUB_OFFLINE=1

njhill commented 6 months ago

@juud79 we're working on https://github.com/vllm-project/vllm/pull/3125 to address this. You can work around this by passing the explicit path to the model in your local HF cache as the model name.

fgebhart commented 1 month ago

Any updated on this? I am running into the same issue and see the mentioned PR being closed and not merged.