add Xinfernece ( an inference platform which integrated transformers, vllm, and llama.cpp as engines,) runtime for LLM Serving Runtime

jaffe-fly commented 3 weeks ago

/kind feature

Describe the solution you'd like

Hope add https://github.com/xorbitsai/inference as the kserve huggingface LLMs serving runtime

Xorbits Inference(Xinference) is a powerful and versatile library designed to serve language, speech recognition, and multimodal models. With Xorbits Inference, you can effortlessly deploy and serve your or state-of-the-art built-in models using just a single command. Whether you are a researcher, developer, or data scientist, Xorbits Inference empowers you to unleash the full potential of cutting-edge AI models.

xinference is an inference platform which integrated transformers, vllm, and llama.cpp as engines, it’s not directly supported by huggingface.

lipengsh commented 3 weeks ago

+1

terrytangyuan commented 3 weeks ago

cc @qinxuye

terrytangyuan commented 3 weeks ago

Is it supported in HuggingFace?

qinxuye commented 3 weeks ago

Oh, hi @terrytangyuan , xinference is an inference platform which integrated transformers, vllm, and llama.cpp as engines, it’s not directly supported by huggingface.

terrytangyuan commented 2 weeks ago

I see. @jaffe-fly Could you update the title and description to reflect that?

kserve / kserve

add Xinfernece ( an inference platform which integrated transformers, vllm, and llama.cpp as engines,) runtime for LLM Serving Runtime #3736