How open-inference-protocol works in LLMs, any use case?

lizzzcai commented 1 year ago

With the increasing popularity of LLMs, many companies have started to look into deploying LLMs.

Instead of infer/predict, completions and embeddings are being used. Most of the API supports stream.

Example API spec:

I would like to check if there is any use cases in the community using open-inference-protocol on LLMs andis there a road map to natively support or extend open-inference-protocol to have better support in LLMs?

The good thing about open-inference-protocol is it standardizes the way people interact with models, and it is very useful when it comes to developing a transformer (pre/post-processing) and integrating different transformers and predictors into an inference graph. A standard protocol also makes it easy to develop a serving runtime that supports different kind of LLMs.

Thanks.

yuzisun commented 1 year ago

Thanks @lizzzcai ! We just added this agenda to today’s US community meeting

cmaddalozzo commented 1 year ago

Another example of an API from HuggingFace's Text Generation Inference server: https://huggingface.github.io/text-generation-inference/

lizzzcai commented 1 year ago

captured some of my thoughts on the API Spec discussed here.

However, I want to bring up a topic on the API Spec. Currently the schema is following to HF. However, from my experience of playing around with LLM and some of the third-party toolkit building on top of LLM/ChatGPT so far, I feel like openAI spec will be the better option as most of the third-party LLM applications are supporting it out of the box, which means a better ecosystem and user experience. If user deployed a LLM model in KServe however the API cannot be used in the LLM toolkit, it will be hard to promote it.

kserve / open-inference-protocol

How open-inference-protocol works in LLMs, any use case? #5