npuichigo / openai_trtllm

OpenAI compatible API for TensorRT LLM triton backend
MIT License
176 stars 27 forks source link

Feature request - Add all v1/ routes #47

Open visitsb opened 5 months ago

visitsb commented 5 months ago

@npuichigo I am trying to use Triton Inference Server with TensorRT-LLM backend with openweb-ui as frontend, but not all routes are provided, e.g. /v1/models etc.

Is there any plan to support all openapi v1 routes?

It will be really great if full openai api support is available, since kserve is still under works.

npuichigo commented 5 months ago

@visitsb It's fine to add /v1/models. But the list of full openai api is long, like /v1/audio, /v1/embedding. What's the minimal subset is needed?

visitsb commented 5 months ago

@npuichigo Thanks for the quick reply!

Are you able to add below? Looking at open-webui's implementation, at minimum-

/v1//models
/v1//chat/completions
/v1/embeddings
/v1/audio/speech
/v1//audio/transcriptions

Wish there was an easier way to provide full compatibility, but sometime in future.

npuichigo commented 5 months ago

The exposed API depends on the actual model hosted in triton backend. Since there's no embedding model available in trtllm, /v1/embeddings is not possible. For embedding model, maybe you can refer to https://github.com/huggingface/text-embeddings-inference.

The same reason applies to /v1/audio/* since no ASR and TTS models are available now in trtllm.