AttributeError: 'LlamaTokenizerFast' object has no attribute 'sp_model'

Kowsher commented 8 months ago

Check before submitting issues

[X] Make sure to pull the latest code, as some issues and bugs have been fixed.
[X] Due to frequent dependency updates, please ensure you have followed the steps in our Wiki
[X] I have read the FAQ section AND searched for similar issues and did not find a similar problem or solution
[X] Third-party plugin issues - e.g., llama.cpp, text-generation-webui, LlamaChat, we recommend checking the corresponding project for solutions
[X] Model validity check - Be sure to check the model's SHA256.md. If the model is incorrect, we cannot guarantee its performance

Type of Issue

Model conversion and merging

Base Model

LLaMA-7B

Operating System

Linux

Describe your issue in detail

# Please copy-and-paste your command here.

When I'm going to Marge tokenizer, I get this error 'AttributeError: 'LlamaTokenizerFast' object has no attribute 'sp_model''

llama_spm = sp_pb2_model.ModelProto() llama_spm.ParseFromString(llama_tokenizer.sp_model.serialized_model_proto())

Dependencies (must be provided for code-related issues)

# Please copy-and-paste your dependencies here.

Execution logs or screenshots

# Please copy-and-paste your logs here.

ymcui commented 8 months ago

Did you use the recommended version of transformers? See requirements.txt

Kowsher commented 8 months ago

Did you use the recommended version of transformers? See requirements.txt

Yes, i followed it, but it works in tokenizer, facing issues in the fast tokenizer.

ymcui commented 8 months ago

Unfortunately, our script is only tested under non-fast tokenizer merging.

github-actions[bot] commented 8 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.

github-actions[bot] commented 7 months ago

Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.

CaoHanyun commented 7 months ago

That's true. I also ran into the same problem. And I think that's because LlamaTokenizerFast is not the same as LlamaTokenizer. The former is just a bpe-tokenizer(btw it is easy to load with a tokenizer.json trained by the library of huggingface/tokenizers), while the latter is equipped more with sp_model. It can be seen that LlamaTokenizerFast has no object named 'sp_model' till now.(https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/tokenization_llama_fast.py)

The image attached shows how LlamaTokenizer supports sp_model.

StephennFernandes commented 3 months ago

facing similar issue tried the recommended version, but still issue persists whats the fix for this ?

ymcui / Chinese-LLaMA-Alpaca