The llm model must be served by triton inference server' ensemble scheduler?

npuichigo / openai_trtllm

OpenAI compatible API for TensorRT LLM triton backend

MIT License

155 stars 25 forks source link

Closed zengqingfu1442 closed 6 months ago

zengqingfu1442 commented 6 months ago

Can i use a single model in triton model repository?

npuichigo commented 6 months ago

do tokenization and de-tokenization yourself?

zengqingfu1442 commented 6 months ago

do tokenization and de-tokenization yourself?

yes. custom backend based on triton python backend.

npuichigo commented 6 months ago

zengqingfu1442 commented 6 months ago

The inputs and outputs of my custom model is above. How can i adjust my model to make it compatible?