Closed chintanshrinath closed 1 year ago
You can add your token as an env var using -e HUGGING_FACE_HUB_TOKEN=YOURTOKEN
docker run -d --gpus all --shm-size 1g -p 8080:80 -e HUGGING_FACE_HUB_TOKEN=YOURTOKEN -v $PWD/data:/data ghcr.io/huggingface/text-generation-inference:0.8 --model-id meta-llama/Llama-2-13b-hf
Hi @FionnD It is working now.
Thank you for your help
Hi @FionnD
Out of curiosity, is there a way to point to the offline downloaded Llama-2 models that come with PyTorch weights like: consolidated.07.pth
? (or do I have to convert them to HF compatible locally first?)
You have to convert them first. Or use this: https://huggingface.co/meta-llama/Llama-2-7b-hf
Thanks @Narsil
I have got the original models from Meta yesterday, but my email on HF was different when I requested for access. So I am waiting.
In the meantime, I have used the script in Transformers to convert the PT weights to HF weights. That worked! Can --model-id
point to a local path like pretrained()
can do? (while I am waiting for HF approval)
It should !
It seems it needs to download it with a valid repo_id and repo_type. Maybe there is another flag that makes this to look for a local path rather than a URL:
2023-07-19T17:16:56.876340Z INFO text_generation_launcher: Sharding model on 4 processes
2023-07-19T17:16:56.876422Z INFO text_generation_launcher: Starting download process.
2023-07-19T17:16:59.078252Z ERROR text_generation_launcher: Download encountered an error: Traceback (most recent call last):
File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py", line 259, in hf_raise_for_status
response.raise_for_status()
File "/opt/conda/lib/python3.9/site-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/api/models/llama-2-13b-chat_hf
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 96, in download_weights
utils.weight_files(model_id, revision, extension)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/hub.py", line 92, in weight_files
filenames = weight_hub_files(model_id, revision, extension)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/hub.py", line 25, in weight_hub_files
info = api.model_info(model_id, revision=revision)
File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 120, in _inner_fn
return fn(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/hf_api.py", line 1604, in model_info
hf_raise_for_status(r)
File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py", line 291, in hf_raise_for_status
raise RepositoryNotFoundError(message, response) from e
huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-64b81a8a-6510cd22052a52ef22157d36)
Repository Not Found for url: https://huggingface.co/api/models/llama-2-13b-chat_hf.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.
Error: DownloadError
llama-2-13b-chat_hf -> llama-2-13b-chat-hf
Underscore to dash
llama-2-13b-chat_hf -> llama-2-13b-chat-hf
Underscore to dash
Thanks @Narsil for a quick prompt. Unfortunately, even though I put the model's directory in the same root, it still wants to download it:
023-07-19T18:04:39.430272Z ERROR text_generation_launcher: Download encountered an error: Traceback (most recent call last):
File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py", line 259, in hf_raise_for_status
response.raise_for_status()
File "/opt/conda/lib/python3.9/site-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/api/models/llama-2-13b-chat-hf
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 96, in download_weights
utils.weight_files(model_id, revision, extension)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/hub.py", line 92, in weight_files
filenames = weight_hub_files(model_id, revision, extension)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/hub.py", line 25, in weight_hub_files
info = api.model_info(model_id, revision=revision)
File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 120, in _inner_fn
return fn(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/hf_api.py", line 1604, in model_info
hf_raise_for_status(r)
File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py", line 291, in hf_raise_for_status
raise RepositoryNotFoundError(message, response) from e
huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-64b825b7-4db1c4294d3905e45c04b149)
Repository Not Found for url: https://huggingface.co/api/models/llama-2-13b-chat-hf.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.
Here is my command just in case:
sudo docker run --name "my-Llama-2-13b-chat" --gpus all --shm-size 2g -p 6066:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:0.8 --model-id llama-2-13b-chat-hf --num-shard 4 --max-total-tokens 8192 --max-input-length 4096
llama-2-13b-chat-hf
directory is in the same path as this command is being executed.
change the model-id
for the name of the local directory you' re mounting.
Currently it' s failing to see the directory, and therefore tries to see if it exists on the hub (this it's possible for it to be an actual hub id)
@maziyarpanahi I am also getting the same error while loading a local folder. Did you able to fix it?
@maziyarpanahi I am also getting the same error while loading a local folder. Did you able to fix it?
I actually got access to the HF weights, but as @Narsil mentioned the path must be the exact same as the model-id
. Same as the transfromers
library, your whole files should be in meta-llama/Llama-2-7b-chat-hf
local path to think it's already there so let's load it
Hi I am trying to run text generation inference with following code. But getting error model='meta-llama/Llama-2-7b-chat-hf' num_shard=2 volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:0.9 --model-id $model --num-shard $num_shard
I have access of llama model from huggingface, and get approval.
And one more question can we do quantize the model with following command. --quantize bitsandbytes --env Please can anyone help me.