Qwen 14B serving failed with BigDL LLM fastchat

intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.

Apache License 2.0

6.47k stars 1.24k forks source link

Qwen 14B serving failed with BigDL LLM fastchat #9707

Closed qzheng527 closed 8 months ago

qzheng527 commented 8 months ago

Trying to use bigdl LLM serving for model qwen 7b following the doc.

Modify the model name from Qwen-7B-Chat to bigdl-7b.

Install python packages by conda, python 3.9.

# Install BigDL LLM
$script_dir/python-occlum/bin/pip install torch==2.1.0 --index-url https://download.pytorch.org/whl/cpu
$script_dir/python-occlum/bin/pip install --pre --upgrade bigdl-llm[all] bigdl-llm[serving]
$script_dir/python-occlum/bin/pip install einops transformers_stream_generator

Try below commands but failed.

./python-occlum/bin/python -m fastchat.serve.controller --host 0.0.0.0
./python-occlum/bin/python -m bigdl.llm.serving.model_worker --model-path /work/models/chatglm2-6b --device cpu --host 0.0.0.0

Trying chatglm2 and vicuna 7b, both works fine.

qiyuangong commented 8 months ago

Hi @qzheng527

Fold name will not impact load, but if some files can not be found loader will raise exceptions. According to error message, loader cannot find tokenization_qwen.py or related files from fold. Please check files in qwen folder.

BTW, this error maybe caused by Qwen's file out of data. Qwen just changed their files (not weights) last week. Please check if these files are the same with latest Qwen files.

https://huggingface.co/Qwen/Qwen-7B-Chat/blob/main/modeling_qwen.py https://huggingface.co/Qwen/Qwen-7B-Chat/blob/main/tokenizer_config.json https://huggingface.co/Qwen/Qwen-7B-Chat/blob/main/generation_config.json https://huggingface.co/Qwen/Qwen-7B-Chat/blob/main/config.json

To avoid loading exception on latest Qwen model, we updated bigdl qwen loader, but it may raise exception on legacy files.

qzheng527 commented 8 months ago

@qiyuangong The tokenization_qwen.py is in the model folder and the model can work fine on general chat test. The model was released on Oct 2023. I will update it and try again.

qzheng527 commented 8 months ago

Updated bigdl, but it didn't work for simple qwen model chat test. error The demo could be found here

Jasonzzt commented 8 months ago

Hi @qzheng527 You can add trust_remote_code=True for local renamed model in bigdl_llm_model.py. https://github.com/intel-analytics/BigDL/blob/128198c6dbcce5c7e2a5edde1fe2304706a7a07a/python/llm/src/bigdl/llm/serving/bigdl_llm_model.py#L245

Here is the related PR https://github.com/intel-analytics/BigDL/pull/9762. You can refer to it, we will make some adjustments next week.

qzheng527 commented 8 months ago

@Jasonzzt Still now worked. To narrow down, I just used the demo code in qwen. It reported the error as below.

BigDL version as below.

bigdl-llm                     2.5.0b20231226

Jasonzzt commented 8 months ago

@Jasonzzt Still now worked. To narrow down, I just used the demo code in qwen. It reported the error as below.

BigDL version as below.
bigdl-llm                     2.5.0b20231226

Please update Qwen model to the latest and try again.

qzheng527 commented 8 months ago

It worked after updating the latest qwen model. Thanks.

qiyuangong commented 8 months ago

Issue closed