intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
Apache License 2.0
6.7k stars 1.26k forks source link

Running benchmark/all-in-one with GLM-4-9B-Chat model report "AutoTP not support for models" #11803

Open dukelee111 opened 2 months ago

dukelee111 commented 2 months ago

Please help to confirm if the GLM-4-9B-Chat is supported , thanks so much.

Docker images:intelanalytics/ipex-llm-serving-vllm-xpu-experiment
Tag:2.1.0b2
Image ID:0e20af44ad46

step: cd /benchmark/all-in-one edit config.yaml bash run-deepspeed-arc.sh

Attached the error trace details: CHATGLM4-9B-Trace

Uxito-Ada commented 2 months ago

Hi @dukelee111 ,

I reproduced and got the same error. "Not able to determine model policy automatically means that GLM-4-9B-Chat is not supported by AutoTP as shown here. It is not found in deepspeed's supported model list.