triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.39k stars 1.49k forks source link

Why there aren't generate and generate_stream api in http client? #7602

Closed MasterYi1024 closed 1 month ago

MasterYi1024 commented 2 months ago

Is your feature request related to a problem? Please describe.

I'm wondering why there aren't generate and generate_stream api in http client?
If I want to use the generate api in C++, should I add it to http client class?

How can I chat with LLM using http client?

Is there a demo?

Describe the solution you'd like

I want genetate and generate_stream api in C language.

Describe alternatives you've considered

Any alternatives will do. I just want to use triton(self built, triton client communicates with tritonserver) to chat with LLM.

Additional context

Nothing.

Thanks in advance:)

KrishnanPrash commented 1 month ago

Hello @MasterYi1024,

The /generate and /generate_stream endpoints were added to provide simpler text-in/text-out payloads without having to use tritonclient and bypassing dealing with input/output tensors. Due to this reason, I don't believe it is currently scoped to support these endpoints within the httpclient class.

Currently, I would recommend following this example for chatting with LLMs with the httpclient.