Closed HoppeDeng closed 2 weeks ago
Hi @HoppeDeng Could you please share the image ID you're using?
I’ve pulled the latest Docker image and conducted VLLM serving, but I wasn't able to reproduce the issue on my end.
If you're not using the latest image, could you try pulling the most recent version and see if the issue persists? If the problem still exists, it would be helpful if you could provide the complete reproduction steps and any error messages you're encountering.
@liu-shaojun It is my mistake. Mounting local dir to /llm is not correct. Use /llm/models dir will work
When you pull this docker file:intelanalytics/ipex-llm-serving-xpu:2.1.0 and start vllm serving like the following cmd: python -m ipex_llm.vllm.xpu.entrypoints.openai.api_server \ --served-model-name $served_model_name \ --port 8000 \ --model $model \ --trust-remote-code \ --gpu-memory-utilization 0.7 \ --device xpu \ --dtype float16 \ --enforce-eager \ --load-in-low-bit fp8 \ --max-model-len 6656 \ --max-num-batched-tokens 6656 \ --tensor-parallel-size 4
It will report ModuleNotFoundError: No module named 'vllm'