michaelfeil / infinity

Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali
https://michaelfeil.github.io/infinity/
MIT License
1.51k stars 116 forks source link

Support Integration with KServe #352

Open indranilr opened 2 months ago

indranilr commented 2 months ago

Feature request

Kserve is a Kubernetes based engine for predictive and generative AI models and provides abstraction for popular model servers like Huggingface TEI (https://github.com/kserve/kserve/pull/3743), Tensorflow,PyTorch etc. Request to support Infinity as a model serving engine in Kserve too.

Motivation

Many organizations using OSS are using Kserve for predictive model deployment, and are attempting to use the same for embedding and generative model deployment. Having Infinity as a model serving engine would help to avoid a separate deployment for infinity altogether Ref :#314 .

Your contribution

NA

michaelfeil commented 2 months ago

@indranilr I like the goal and mission of the Kserve project. That said, I have not worked with it extensively in the past. I would be happy to assist someone (like you) having questions for working on the integration.