npuichigo / openai_trtllm

OpenAI compatible API for TensorRT LLM triton backend
MIT License
155 stars 25 forks source link

The llm model must be served by triton inference server' ensemble scheduler? #35

Closed zengqingfu1442 closed 6 months ago

zengqingfu1442 commented 6 months ago

Can i use a single model in triton model repository?

npuichigo commented 6 months ago

do tokenization and de-tokenization yourself?

zengqingfu1442 commented 6 months ago

do tokenization and de-tokenization yourself?

yes. custom backend based on triton python backend.

npuichigo commented 6 months ago

If you use bls model like https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/all_models/inflight_batcher_llm/tensorrt_llm_bls/config.pbtxt, I think it's compatible as the model inputs are compatible.

zengqingfu1442 commented 6 months ago

image The inputs and outputs of my custom model is above. How can i adjust my model to make it compatible?