meta-llama / llama

Inference code for Llama models
Other
56.45k stars 9.57k forks source link

can't run llama-2-7b-hf even though I'm using use_auth_token #374

Open brando90 opened 1 year ago

brando90 commented 1 year ago

Error:

-- Get HuggingFace LLaMA index LLM
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:07<00:00,  3.56s/it]
Traceback (most recent call last):
  File "/lfs/ampere1/0/brando9/miniconda/envs/maf/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 261, in hf_raise_for_status
    response.raise_for_status()
  File "/lfs/ampere1/0/brando9/miniconda/envs/maf/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/meta-llama/Llama-2-7b-hf/resolve/main/tokenizer_config.json

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/lfs/ampere1/0/brando9/miniconda/envs/maf/lib/python3.10/site-packages/transformers/utils/hub.py", line 417, in cached_file
    resolved_file = hf_hub_download(
  File "/lfs/ampere1/0/brando9/miniconda/envs/maf/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "/lfs/ampere1/0/brando9/miniconda/envs/maf/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1195, in hf_hub_download
    metadata = get_hf_file_metadata(
  File "/lfs/ampere1/0/brando9/miniconda/envs/maf/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "/lfs/ampere1/0/brando9/miniconda/envs/maf/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1541, in get_hf_file_metadata
    hf_raise_for_status(r)
  File "/lfs/ampere1/0/brando9/miniconda/envs/maf/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 293, in hf_raise_for_status
    raise RepositoryNotFoundError(message, response) from e
huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-64b70d44-24ec86d03e68830022d37425;109c9108-722e-401e-b2de-552f182609a6)

Repository Not Found for url: https://huggingface.co/meta-llama/Llama-2-7b-hf/resolve/main/tokenizer_config.json.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/lfs/ampere1/0/brando9/massive-autoformalization-maf/maf-src/data_utils/informal_textbook_2_informal_data_frame.py", line 261, in <module>
    textbook_txt_2_maf_informal_data_frame()
  File "/lfs/ampere1/0/brando9/massive-autoformalization-maf/maf-src/data_utils/informal_textbook_2_informal_data_frame.py", line 162, in textbook_txt_2_maf_informal_data_frame
    llm = HuggingFaceLLM(
  File "/lfs/ampere1/0/brando9/miniconda/envs/maf/lib/python3.10/site-packages/llama_index/llms/huggingface.py", line 64, in __init__
    self.tokenizer = tokenizer or AutoTokenizer.from_pretrained(
  File "/lfs/ampere1/0/brando9/miniconda/envs/maf/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 643, in from_pretrained
    tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
  File "/lfs/ampere1/0/brando9/miniconda/envs/maf/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 487, in get_tokenizer_config
    resolved_config_file = cached_file(
  File "/lfs/ampere1/0/brando9/miniconda/envs/maf/lib/python3.10/site-packages/transformers/utils/hub.py", line 433, in cached_file
    raise EnvironmentError(
OSError: meta-llama/Llama-2-7b-hf is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or log in with `huggingface-cli login` and pass `use_auth_token=True`.
Daryl149 commented 1 year ago

Try my version otherwise, just converted it, public repo: https://huggingface.co/daryl149/llama-2-7b-chat-hf

brando90 commented 1 year ago

seems that I just need to wait for the official HF permission not only metas?

jasonsheinkopf commented 1 year ago

I have been granted access Gated model You have been granted access to this model But get the same error.

I created a new 'read' access token to use.

Do I need to use a specific access token or can I just create one?

l294265421 commented 1 year ago

the same problem

JaktensTid commented 1 year ago

I have the same problem when I just try to clone repo from hugging face using git clone

GitMeAI commented 1 year ago

Having similar issues.

OSError: llama-2-7b.ggmlv3.q2_K.bin is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models' If this is a private repository, make sure to pass a token having permission to this repo with use_auth_token or log in with huggingface-cli login and pass use_auth_token=True.

romilgoel commented 1 year ago

I was also getting the same issue. It worked for me. Here are the steps that i followed :

  1. Get approval from Meta
  2. Get approval from HF
  3. Create a read token from here : https://huggingface.co/settings/tokens
  4. pip install transformers
  5. execute huggingface-cli login and provide read token
  6. Execute your code. It should work fine.
drorata commented 1 year ago

I'm trying to follow this tutorial and I failed at the:

tokenizer = AutoTokenizer.from_pretrained(model)

step. I ran huggingface-cli login in the shell and then tried to run the code from the tutorial (either as a script or interactively in a notebook). In both cases I get the error:

OSError: meta-llama/Llama-2-7b-chat-hf is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or log in with `huggingface-cli login` and pass `use_auth_token=True`.

I guess I'm missing the (1) step in @romilgoel's answer. Can you give some hints how to do it?

N.B.

FWIW, I opened https://huggingface.co/meta-llama/Llama-2-7b-chat-hf and there was a button to click. I'm now waiting :)

image

N.B. 2

Yep, that's probably what I was missing. I ran into another problem (ValueError: Could not load model meta-llama/Llama-2-7b-chat-hf with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>, <class 'transformers.models.llama.modeling_llama.LlamaForCausalLM'>).), but that's a different story probably.

inesdmu commented 1 year ago

Hi, I am having a similar problem:

%pip install transformers
%pip install accelerate
!pip install huggingface-hub==0.14.1

!huggingface-cli login --token "my_token"

from transformers import AutoTokenizer
import transformers
import torch

model = "meta-llama/Llama-2-7b-chat-hf"

tokenizer = AutoTokenizer.from_pretrained(model)

I am however getting the following error 401 Client Error: Unauthorized for url: https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json

I got both the Meta and the HF accesses granted, this token corresponds to the account having the access granted.

Any idea where this could come from?

jianyinglangaws commented 1 year ago

I got a similar error too. I got approval from Meta and hugging face and provided token access through huggingface-cli login.

OSError: meta-llama/Llama-2-7b-hf is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or log in with `huggingface-cli login` and pass `use_auth_token=True`.
bnicholl commented 1 year ago

I am also getting the same error with approval from Meta and hugging face.

Jasonli1997 commented 1 year ago

Yeah I'm also getting the same error with approval from Meta and using the access token from Huggingface and setting use_auth_token = True

logancyang commented 1 year ago

Also getting the same 401 error with approval from both meta and hf. Went thru these steps still no luck

I was also getting the same issue. It worked for me. Here are the steps that i followed :

  1. Get approval from Meta
  2. Get approval from HF
  3. Create a read token from here : https://huggingface.co/settings/tokens
  4. pip install transformers
  5. execute huggingface-cli login and provide read token
  6. Execute your code. It should work fine.
dylanxia2017 commented 1 year ago

I was also getting the same issue. It worked for me. Here are the steps that i followed :

  1. Get approval from Meta
  2. Get approval from HF
  3. Create a read token from here : https://huggingface.co/settings/tokens
  4. pip install transformers
  5. execute huggingface-cli login and provide read token
  6. Execute your code. It should work fine.

this doesn't work on my case

Jasonli1997 commented 1 year ago

Yeah I'm also getting the same error with approval from Meta and using the access token from Huggingface and setting use_auth_token = True

I was able to get everything running after downloading the Huggingface repo with git-lfs

SimasJan commented 1 year ago

Try using a different provider.

example:

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("togethercomputer/LLaMA-2-7B-32K")
model = AutoModelForCausalLM.from_pretrained("togethercomputer/LLaMA-2-7B-32K")
alinemati-uwm commented 1 year ago

Try using a different provider.

example:

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("togethercomputer/LLaMA-2-7B-32K")
model = AutoModelForCausalLM.from_pretrained("togethercomputer/LLaMA-2-7B-32K")

checked. not working.

MaratZakirov commented 1 year ago

I was also getting the same issue. It worked for me. Here are the steps that i followed :

  1. Get approval from Meta
  2. Get approval from HF
  3. Create a read token from here : https://huggingface.co/settings/tokens
  4. pip install transformers
  5. execute huggingface-cli login and provide read token
  6. Execute your code. It should work fine.

Can download all the files but code still fails

drorata commented 1 year ago

@MaratZakirov Check out this thread

dakshbhatnagar commented 1 year ago

I was also getting the same issue. It worked for me. Here are the steps that i followed :

  1. Get approval from Meta
  2. Get approval from HF
  3. Create a read token from here : https://huggingface.co/settings/tokens
  4. pip install transformers
  5. execute huggingface-cli login and provide read token
  6. Execute your code. It should work fine.

How to get approval @romilgoel from HF. Facebook I guess also haven't shared the access with me. I get the below error

meta-llama/Llama-2-7b-chat-hf is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models' If this is a private repository, make sure to pass a token having permission to this repo withuse_auth_tokenor log in withhuggingface-cli loginand passuse_auth_token=True.

chaudharynitin commented 1 year ago

I am also getting below error and unable to fix

Repository Not Found for url: https://huggingface.co/api/models/llama-2-7b-chat.ggmlv3.q4_0.bin/revision/main. Please make sure you specified the correct repo_id and repo_type. If you are trying to access a private or gated repo, make sure you are authenticated.

jocelin commented 1 year ago

for this error

ValueError: Could not load model meta-llama/Llama-2-7b-chat-hf with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>, <class 'transformers.models.llama.modeling_llama.LlamaForCausalLM'>).)

I have been able to resolve it using below script

model_name = "meta-llama/Llama-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.float16,
    device_map="auto",
)

However, running with AutoModelForCausalLM directly will lead to another issue around Xformers (xformers is not installed correcly). https://github.com/huggingface/transformers/issues/24903

To fix xformers issue, apply the changes in https://github.com/huggingface/transformers/pull/24960 would work. I changed the Pipfile to load transformers from git (since the PR is not yet released) and can get llama2 working after

transformers = { git = "https://github.com/huggingface/transformers.git@main" }
margish100 commented 1 year ago

Error

10:59:35

404 Client Error. (Request ID: Root=1-64d3243f-4b364ff52499ba15050bc73c)

Repository Not Found for url: https://huggingface.co/api/models/llama-2-7b-chat.ggmlv3.q8_0.bin/revision/main. Please make sure you specified the correct repo_id and repo_type.Error

10:59:35

404 Client Error. (Request ID: Root=1-64d3243f-4b364ff52499ba15050bc73c)

Repository Not Found for url: https://huggingface.co/api/models/llama-2-7b-chat.ggmlv3.q8_0.bin/revision/main. Please make sure you specified the correct repo_id and repo_type. If you are trying to access a private or gated repo, make sure you are authenticated. If you are trying to access a private or gated repo, make sure you are authenticated.

Auth token i have set but still i got same error and how to get access from HF and meta

MustafaAlahmid commented 1 year ago

this worked for me :

change model name in adapter_config.json to "NousResearch/Llama-2-7b-hf" to use non gated llama2 models

puneethegde commented 1 year ago

Run this !huggingface-cli login

Login using your token then run this

!pip install huggingface_hub

Solved for me!

jiafuzha commented 1 year ago

set use_auth_token to your actual token. it worked for me.

realliyifei commented 1 year ago

Try my version otherwise, just converted it, public repo: huggingface.co/daryl149/llama-2-7b-chat-hf

I am trying your model as the workaround.

@Daryl149 Is your daryl149/llama-2-7b-hf exactly the same as meta-llama/Llama-2-7b ? (in which ‘hf’ stands for hugging face?)

gkcng commented 1 year ago

Same as jiafuzha, both logging in via huggingface-cli login or setting use_auth_token works for me, after approval from both Meta and HF, then created a HF Token.

pretrained_name_or_path='meta-llama/Llama-2-7b-hf' 

model = transformers.AutoModelForCausalLM.from_pretrained(
    pretrained_name_or_path,
    trust_remote_code="true",
    torch_dtype=fp_type,
    device_map= None,
    # token=HF_TOKEN,
    use_auth_token=HF_TOKEN
)

tokenizer = AutoTokenizer.from_pretrained(
    pretrained_name_or_path, 
    trust_remote_code="true", 
    padding_side="left",
    # token=HF_TOKEN, 
    use_auth_token=HF_TOKEN
)

NB: The only annoying thing was getting warning messages saying use_auth_token is deprecated and to use token instead, but when I do both calls errored out.

tcapelle commented 1 year ago

Lol I was using: meta-llama/Llama-2-7B-hf instead of meta-llama/Llama-2-7b-hf...

karan842 commented 1 year ago

I think we have to request to the Meta to use this model. Screenshot (16)

karan842 commented 1 year ago

set use_auth_token to your actual token. it worked for me.

I tried this but still getting an error!!

curtiskeisler commented 1 year ago

@karan842 try this . . .

--->8 -- code --8<-- from getpass import getpass hftoken = getpass('Enter Huggingface token: ') --->8 -- code --8<--

Run the above in a code block. You'll be prompted for your Huggingface token. Enter it. That will store it in the hftoken variable without storing it anywhere.

Then run this to load the model

--->8 -- code --8<-- from transformers import AutoTokenizer, AutoModelForCausalLM, AutoModel import transformers import torch cache_dir = "./model_cache"

model_name = "meta-llama/Llama-2-7b-chat-hf"

model = model_name tokenizer = AutoTokenizer.from_pretrained( model, cache_dir = cache_dir,
trust_remote_code=True, token=hftoken
)

pipeline = transformers.pipeline( "text-generation", model=model, tokenizer=tokenizer, torch_dtype=torch.bfloat16, device_map="auto", token=hftoken
) --->8 -- code --8<--

You should be ok after this . . . best of luck!

miko8422 commented 10 months ago

Try my version otherwise, just converted it, public repo: https://huggingface.co/daryl149/llama-2-7b-chat-hf

Is your checkpoint different from others? I tried to download your checkpoint but got this.

python -m llama.llama_quant daryl149/llama-2-7b-chat-hf c4 --wbits 8 --save pyllama-7B8b.pt

pytorch_model-00001-of-00002.bin: 100%|████| 9.98G/9.98G [13:20<00:00, 12.5MB/s]
pytorch_model-00002-of-00002.bin: 100%|████████████████████████████████████████████████████| 3.50G/3.50G [04:46<00:00, 12.2MB/s]
Downloading shards: 100%|████████████████████████| 2/2 [18:07<00:00, 543.68s/it]███████████| 3.50G/3.50G [04:46<00:00, 12.6MB/s]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████| 2/2 [00:12<00:00,  6.47s/it]
generation_config.json: 100%|██████████████████████████████████████████████████████████████████| 137/137 [00:00<00:00, 40.0kB/s]
tokenizer_config.json: 100%|███████████████████████████████████████████████████████████████████| 727/727 [00:00<00:00, 1.54MB/s]
tokenizer.model: 100%|████████████████████████████████████████████████████████████████████████| 500k/500k [00:01<00:00, 410kB/s]
special_tokens_map.json: 100%|██████████████████████████████████████████████████████████████████| 411/411 [00:00<00:00, 851kB/s]
tokenizer.json: 100%|██████████████████████████████████████████████████████████████████████| 1.84M/1.84M [00:01<00:00, 1.46MB/s]
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'LlamaTokenizer'. 
The class this function is called from is 'LLaMATokenizer'.
Traceback (most recent call last):
  File "/home/neroism/anaconda3/envs/LLama/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/neroism/anaconda3/envs/LLama/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/neroism/anaconda3/envs/LLama/lib/python3.9/site-packages/llama/llama_quant.py", line 477, in <module>
    run()
  File "/home/neroism/anaconda3/envs/LLama/lib/python3.9/site-packages/llama/llama_quant.py", line 436, in run
    tokenizer = LLaMATokenizer.from_pretrained(
  File "/home/neroism/anaconda3/envs/LLama/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2028, in from_pretrained
    return cls._from_pretrained(
  File "/home/neroism/anaconda3/envs/LLama/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2260, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/home/neroism/anaconda3/envs/LLama/lib/python3.9/site-packages/llama/hf/tokenization_llama.py", line 64, in __init__
    super().__init__(
  File "/home/neroism/anaconda3/envs/LLama/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 367, in __init__
    self._add_tokens(
  File "/home/neroism/anaconda3/envs/LLama/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 467, in _add_tokens
    current_vocab = self.get_vocab().copy()
  File "/home/neroism/anaconda3/envs/LLama/lib/python3.9/site-packages/llama/hf/tokenization_llama.py", line 90, in get_vocab
    vocab = {self.convert_ids_to_tokens(i): i for i in range(self.vocab_size)}
  File "/home/neroism/anaconda3/envs/LLama/lib/python3.9/site-packages/llama/hf/tokenization_llama.py", line 78, in vocab_size
    return self.sp_model.get_piece_size()
AttributeError: 'LLaMATokenizer' object has no attribute 'sp_model'

I have the exact same error as @brando90, the same error.

And this is the error I got, it seems like the tokenizer of your model is different from the offical one? Where can I find this 'LlaMATokenlizer'? because I saw the log got this The tokenizer class you load from this checkpoint is 'LlamaTokenizer'. The class this function is called from is 'LLaMATokenizer'.. I wonder if I could solve this promblem by simply change the name of it?

Stosan commented 7 months ago

pass your hf_token to use_auth_token

tokenizer = AutoTokenizer.from_pretrained(base_model,  use_auth_token=your_hf_token, trust_remote_code=True)
wsxhjnb1 commented 5 months ago

After being approved by HF and Meta, I entered the HF my token pages, where I selected the repo I wanted to access and checked Interact with discussions / Open pull requests on selected repos Write access to contents/settings of selected repos These two options.

Alpha-T30 commented 4 months ago

image while creating a new token don't use the default type, use the read type. I have lost hours to solve this small issue

dejiavu commented 2 months ago

image while creating a new token don't use the default type, use the read type. I have lost hours to solve this small issue

This fixed my problem

MillionaireChen commented 4 days ago
image

come here, and submit your information, name, birthday,.... and wait 10 minutes for authorization. An email will sent to you. After that, create token, and login your token at the terminal.