Regarding issue with meta-llama/Llama-2-7b-chat-hf to run with text generation inference

chintanshrinath commented 1 year ago

Hi I am trying to run text generation inference with following code. But getting error model='meta-llama/Llama-2-7b-chat-hf' num_shard=2 volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:0.9 --model-id $model --num-shard $num_shard

I have access of llama model from huggingface, and get approval.

And one more question can we do quantize the model with following command. --quantize bitsandbytes --env Please can anyone help me.

FionnD commented 1 year ago

You can add your token as an env var using -e HUGGING_FACE_HUB_TOKEN=YOURTOKEN

docker run -d --gpus all --shm-size 1g -p 8080:80 -e HUGGING_FACE_HUB_TOKEN=YOURTOKEN -v $PWD/data:/data ghcr.io/huggingface/text-generation-inference:0.8 --model-id meta-llama/Llama-2-13b-hf

chintanshrinath commented 1 year ago

Hi @FionnD It is working now.

Thank you for your help

maziyarpanahi commented 1 year ago

Hi @FionnD

Out of curiosity, is there a way to point to the offline downloaded Llama-2 models that come with PyTorch weights like: consolidated.07.pth? (or do I have to convert them to HF compatible locally first?)

Narsil commented 1 year ago

You have to convert them first. Or use this: https://huggingface.co/meta-llama/Llama-2-7b-hf

maziyarpanahi commented 1 year ago

Thanks @Narsil I have got the original models from Meta yesterday, but my email on HF was different when I requested for access. So I am waiting. In the meantime, I have used the script in Transformers to convert the PT weights to HF weights. That worked! Can --model-id point to a local path like pretrained() can do? (while I am waiting for HF approval)

Narsil commented 1 year ago

It should !

maziyarpanahi commented 1 year ago

It seems it needs to download it with a valid repo_id and repo_type. Maybe there is another flag that makes this to look for a local path rather than a URL:

2023-07-19T17:16:56.876340Z  INFO text_generation_launcher: Sharding model on 4 processes
2023-07-19T17:16:56.876422Z  INFO text_generation_launcher: Starting download process.
2023-07-19T17:16:59.078252Z ERROR text_generation_launcher: Download encountered an error: Traceback (most recent call last):

  File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py", line 259, in hf_raise_for_status
    response.raise_for_status()

  File "/opt/conda/lib/python3.9/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)

requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/api/models/llama-2-13b-chat_hf

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 96, in download_weights
    utils.weight_files(model_id, revision, extension)

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/hub.py", line 92, in weight_files
    filenames = weight_hub_files(model_id, revision, extension)

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/hub.py", line 25, in weight_hub_files
    info = api.model_info(model_id, revision=revision)

  File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 120, in _inner_fn
    return fn(*args, **kwargs)

  File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/hf_api.py", line 1604, in model_info
    hf_raise_for_status(r)

  File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py", line 291, in hf_raise_for_status
    raise RepositoryNotFoundError(message, response) from e

huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-64b81a8a-6510cd22052a52ef22157d36)

Repository Not Found for url: https://huggingface.co/api/models/llama-2-13b-chat_hf.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.

Error: DownloadError

Narsil commented 1 year ago

llama-2-13b-chat_hf -> llama-2-13b-chat-hf

Underscore to dash

maziyarpanahi commented 1 year ago

llama-2-13b-chat_hf -> llama-2-13b-chat-hf

Underscore to dash

Thanks @Narsil for a quick prompt. Unfortunately, even though I put the model's directory in the same root, it still wants to download it:

023-07-19T18:04:39.430272Z ERROR text_generation_launcher: Download encountered an error: Traceback (most recent call last):

  File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py", line 259, in hf_raise_for_status
    response.raise_for_status()

  File "/opt/conda/lib/python3.9/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)

requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/api/models/llama-2-13b-chat-hf

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 96, in download_weights
    utils.weight_files(model_id, revision, extension)

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/hub.py", line 92, in weight_files
    filenames = weight_hub_files(model_id, revision, extension)

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/hub.py", line 25, in weight_hub_files
    info = api.model_info(model_id, revision=revision)

  File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 120, in _inner_fn
    return fn(*args, **kwargs)

  File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/hf_api.py", line 1604, in model_info
    hf_raise_for_status(r)

  File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py", line 291, in hf_raise_for_status
    raise RepositoryNotFoundError(message, response) from e

huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-64b825b7-4db1c4294d3905e45c04b149)

Repository Not Found for url: https://huggingface.co/api/models/llama-2-13b-chat-hf.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.

Here is my command just in case:

sudo docker run --name "my-Llama-2-13b-chat" --gpus all --shm-size 2g -p 6066:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:0.8 --model-id llama-2-13b-chat-hf --num-shard 4 --max-total-tokens 8192 --max-input-length 4096

llama-2-13b-chat-hf directory is in the same path as this command is being executed.

Narsil commented 1 year ago

change the model-id for the name of the local directory you' re mounting. Currently it' s failing to see the directory, and therefore tries to see if it exists on the hub (this it's possible for it to be an actual hub id)

roshan-gopalakrishnan commented 1 year ago

@maziyarpanahi I am also getting the same error while loading a local folder. Did you able to fix it?

maziyarpanahi commented 1 year ago

@maziyarpanahi I am also getting the same error while loading a local folder. Did you able to fix it?

I actually got access to the HF weights, but as @Narsil mentioned the path must be the exact same as the model-id. Same as the transfromers library, your whole files should be in meta-llama/Llama-2-7b-chat-hf local path to think it's already there so let's load it

huggingface / text-generation-inference

Regarding issue with meta-llama/Llama-2-7b-chat-hf to run with text generation inference #644