intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.43k stars 1.24k forks source link

vLLM offline_inference.py failed to run on CPU inference #11056

Open eugeooi opened 3 months ago

eugeooi commented 3 months ago

Failed to run python offline_inference.py from link for vLLM offline inference on CPU. It seems that llm.py has been removed in the previous version.

gc-fu commented 3 months ago

Hi, the vLLM CPU backend is removed for now. The support will be added back later. Sorry for the inconvenience.