mit-han-lab / qserve

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Apache License 2.0
396 stars 18 forks source link

Couldn't instantiate the backend tokenizer #8

Open Rudin6 opened 4 months ago

Rudin6 commented 4 months ago

According to the installation instructions, an error occurred when I ran the offline benchmarking script and I have installed the sentencepiece. How can I solve it ? Thanks!

INFO 05-16 17:10:41 llm_engine.py:90] Initializing an LLM engine with config: model='./qserve_checkpoints/Llama-3-8B-QServe', tokenizer='./qserve_checkpoints/Llama-3-8B-QServe', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=auto, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=int8, device_config=cuda, ifb_config=False, seed=0) Traceback (most recent call last): File "/workspace/llm/qserve/qserve_benchmark.py", line 128, in main(args) File "/workspace/llm/qserve/qserve_benchmark.py", line 95, in main engine = initialize_engine(args) File "/workspace/llm/qserve/qserve_benchmark.py", line 73, in initialize_engine return LLMEngine.from_engine_args(engine_args) File "/workspace/llm/qserve/qserve/engine/llm_engine.py", line 241, in from_engine_args engine = cls( File "/workspace/llm/qserve/qserve/engine/llm_engine.py", line 131, in init self._init_tokenizer() File "/workspace/llm/qserve/qserve/engine/llm_engine.py", line 194, in _init_tokenizer self.tokenizer = get_tokenizer(self.model_config.tokenizer, init_kwargs) File "/workspace/llm/qserve/qserve/utils/tokenizer.py", line 50, in get_tokenizer raise e File "/workspace/llm/qserve/qserve/utils/tokenizer.py", line 27, in get_tokenizer tokenizer = AutoTokenizer.from_pretrained( File "/root/anaconda3/envs/QServe/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 862, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, *kwargs) File "/root/anaconda3/envs/QServe/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2089, in from_pretrained return cls._from_pretrained( File "/root/anaconda3/envs/QServe/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2311, in _from_pretrained tokenizer = cls(init_inputs, init_kwargs) File "/root/anaconda3/envs/QServe/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 120, in init raise ValueError( ValueError: Couldn't instantiate the backend tokenizer from one of: (1) a tokenizers library serialization file, (2) a slow tokenizer instance to convert or (3) an equivalent slow tokenizer class to instantiate and convert. You need to have sentencepiece installed to convert a slow tokenizer to a fast one.

ys-2020 commented 3 months ago

Hi @Rudin6 , thanks for your interests in QServe! Could you provide more information about this error? For example, what is the version of your transformers and tokenizers? We are using 0.15.1 of tokenizers.