intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.27k stars 1.23k forks source link

TypeError:invalidInputError() missing 1 required positional argument: 'errMsg' with vllm-serving. #10661

Open Vasud-ha opened 3 months ago

Vasud-ha commented 3 months ago

Hi, I am trying to run vllm-serving for the neural-chat model using https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/vLLM-Serving . However facing this issue image Offline inferencing is working fine but facing above issue with the continuous batching feature, could you please guide how to resolve and run it for neural-chat.

jenniew commented 3 months ago

This would be an issue of printing error message. We'll fix it soon.