triton-inference-server / client

Triton Python, C++ and Java client libraries, and GRPC-generated client examples for go, java and scala.
BSD 3-Clause "New" or "Revised" License
517 stars 224 forks source link

Add embedding and ranking support #715

Closed dyastremsky closed 6 days ago

dyastremsky commented 1 week ago

Add support for embeddings models (which use the OpenAI API here) as well as ranking models (HuggingFace Text Embedding Interface's re-ranker API here).

Please see the individual PRs in this placeholder PR to see the updates made to LLM input generation, metric generation, and documentation to address the various pieces that went into adding support for these two model types and related APIs.