Closed Maykeye closed 12 months ago
The protobuf version is determined by the sentencepiece library which is used by the original LLaMA. Therefore, we have no control over that unfortunately.
Can't you just update Tokenizer to "fast" one, which is what transforms wants for some reason? Problem occurs when transformers try to convert to some fast version, it takes a long time. But after that if you save it and load, it seems tokenizer works without protobuf or sentencepiece.
Here is a notebook that demonstrates it:
So whatever save_pretrained is doing, it seems that it still works.
Unfortunately we can't. There are other inference libraries such as llama.cpp that does not use the transformers fast tokenizer. We want to make our OpenLLaMA a drop-in replacement for LLLaMA in all libraries, not just transformers.
(Transformers v4.30.2)
Tokenizer of openllama can't be used out of the box unless protobuf=3 is installed or env variables are changed. And since many packages require v4 now, protobuf is prone to upgrade, then this happens:
(It can be fixed by following tip above
after which
AutoTokenizer.from_pretrained('.')
doesn't take 2 minutes to load likeAutoTokenizer.from_pretrained("openlm-research/open_llama_3b")
does. It takes longer to load the tokenizer than the model)And it's not only for 3B. The most recent model on HF as of now
openlm-research/open_llama_7b_v2
also has the same issue