Open ashishkamra opened 5 months ago
One idea to explore is to use a similar proxy as KServe transformer around TGIS image so that we can easily plug it in the ServingRuntime
.
The existing HF provided will not probably work OOTB with TGIS but the work to adapt should not be too much.
@Xaenalt @dtrifiro Wdyt? (I'm just thinking for now at a community/research spike, nothing long term/supported)
FYI KServe project itself is considering to add OpenAI API support for chat completition as part of open-inference-protocol spec
Request The ask is to introduce a openai text generation API compatibility layer (chat completion endpoint) to kserve/TGIS.
Why Having an openai API compatibility layer will allow more open source tools such as 'https://github.com/EleutherAI/lm-evaluation-harness' to interoperate with our model serving stack
Suggested Implemention Use litellm openai proxy server - https://litellm.vercel.app/docs/providers/huggingface