Open Anushagudipati opened 2 months ago
Is the problem solved?I also encountered this problem
@liu904-61 @Anushagudipati can you pls upgrade to the latest transformers 4.40.1
. This should have the latest.
hey @HamidShojanazeri I am having the same issue after upgraded the transformers to 4.40.1
I also encountered this problem. Is the problem solved?
You need to change your function, the function I used in the .py script is LlamaTokenizer.from_pretrained() and you just need to change it to AutoTokenizer.from_pretrained().
unable load Tokenizer using AutoTokenizer.from_pretrained()
errors: tokenizer = AutoTokenizer.from_pretrained(model_id) File "/home/ubuntu/venv/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 862, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, *kwargs) File "/home/ubuntu/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2089, in from_pretrained return cls._from_pretrained( File "/home/ubuntu/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2311, in _from_pretrained tokenizer = cls(init_inputs, **init_kwargs) File "/home/ubuntu/venv/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 120, in init raise ValueError( ValueError: Couldn't instantiate the backend tokenizer from one of: (1) a
tokenizers
library serialization file, (2) a slow tokenizer instance to convert or (3) an equivalent slow tokenizer class to instantiate and convert. You need to have sentencepiece installed to convert a slow tokenizer to a fast one.+++++++++++++++++++++++++++++++++++
config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 654/654 [00:00<00:00, 6.03MB/s] special_tokens_map.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 73.0/73.0 [00:00<00:00, 797kB/s] tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 51.0k/51.0k [00:00<00:00, 55.3MB/s] The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. The tokenizer class you load from this checkpoint is 'PreTrainedTokenizerFast'. The class this function is called from is 'LlamaTokenizer'. You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the
tokenizer = AutoTokenizer.from_pretrained(checkpoint_path)
File "/home/ubuntu/venv/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 843, in from_pretrained
return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, *kwargs)
File "/home/ubuntu/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2048, in from_pretrained
return cls._from_pretrained(
File "/home/ubuntu/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2082, in _from_pretrained
slow_tokenizer = (cls.slow_tokenizer_class)._from_pretrained(
File "/home/ubuntu/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2287, in _from_pretrained
tokenizer = cls(init_inputs, **init_kwargs)
File "/home/ubuntu/venv/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py", line 182, in init
self.sp_model = self.get_spm_processor(kwargs.pop("from_slow", False))
File "/home/ubuntu/venv/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py", line 209, in get_spm_processor
tokenizer.Load(self.vocab_file)
File "/home/ubuntu/venv/lib/python3.10/site-packages/sentencepiece/init.py", line 961, in Load
return self.LoadFromFile(model_file)
File "/home/ubuntu/venv/lib/python3.10/site-packages/sentencepiece/init.py", line 316, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
TypeError: not a string
legacy
(previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, setlegacy=False
. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 Traceback (most recent call last): File "/home/ubuntu/llama3-8b-base.py", line 28, in