intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.48k stars 1.24k forks source link

vllm package is missing in intelanalytics/ipex-llm-serving-xpu:2.1.0 #11944

Closed HoppeDeng closed 2 weeks ago

HoppeDeng commented 2 weeks ago

When you pull this docker file:intelanalytics/ipex-llm-serving-xpu:2.1.0 and start vllm serving like the following cmd: python -m ipex_llm.vllm.xpu.entrypoints.openai.api_server \ --served-model-name $served_model_name \ --port 8000 \ --model $model \ --trust-remote-code \ --gpu-memory-utilization 0.7 \ --device xpu \ --dtype float16 \ --enforce-eager \ --load-in-low-bit fp8 \ --max-model-len 6656 \ --max-num-batched-tokens 6656 \ --tensor-parallel-size 4

It will report ModuleNotFoundError: No module named 'vllm'

liu-shaojun commented 2 weeks ago

Hi @HoppeDeng Could you please share the image ID you're using?

I’ve pulled the latest Docker image and conducted VLLM serving, but I wasn't able to reproduce the issue on my end. image

If you're not using the latest image, could you try pulling the most recent version and see if the issue persists? If the problem still exists, it would be helpful if you could provide the complete reproduction steps and any error messages you're encountering.

HoppeDeng commented 2 weeks ago

@liu-shaojun It is my mistake. Mounting local dir to /llm is not correct. Use /llm/models dir will work