intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.48k stars 1.24k forks source link

Running benchmark/all-in-one with GLM-4-9B-Chat model report "AutoTP not support for models" #11803

Open dukelee111 opened 1 month ago

dukelee111 commented 1 month ago

Please help to confirm if the GLM-4-9B-Chat is supported , thanks so much.

Docker images:intelanalytics/ipex-llm-serving-vllm-xpu-experiment
Tag:2.1.0b2
Image ID:0e20af44ad46

step: cd /benchmark/all-in-one edit config.yaml bash run-deepspeed-arc.sh

Attached the error trace details: CHATGLM4-9B-Trace

Uxito-Ada commented 3 weeks ago

Hi @dukelee111 ,

I reproduced and got the same error. "Not able to determine model policy automatically means that GLM-4-9B-Chat is not supported by AutoTP as shown here. It is not found in deepspeed's supported model list.