Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.27k
stars
1.23k
forks
source link
TypeError:invalidInputError() missing 1 required positional argument: 'errMsg' with vllm-serving. #10661
Hi, I am trying to run vllm-serving for the neural-chat model using https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/vLLM-Serving . However facing this issue
Offline inferencing is working fine but facing above issue with the continuous batching feature, could you please guide how to resolve and run it for neural-chat.