triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.18k stars 1.46k forks source link

How to set request id in Generate? #6962

Closed Missmiaom closed 2 months ago

Missmiaom commented 7 months ago

This doesn't seem to work

/v2/models/ensemble/generate

{
    "text_input": "...",
    "parameters": {
        "id": "123",
    },
    "sequence_id": "456"
}

verbose log:

image

yinggeh commented 7 months ago

Hi @Missmiaom,

Can you please provide full reproduction steps (all commands run) and fill out the bug template below? Thank you! Description A clear and concise description of what the bug is.

Triton Information What version of Triton are you using?

Are you using the Triton container or did you build it yourself?

To Reproduce Steps to reproduce the behavior.

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

Expected behavior A clear and concise description of what you expected to happen.

Missmiaom commented 7 months ago

Triton Information What version of Triton are you using? 23.10

Are you using the Triton container or did you build it yourself? container

To Reproduce Steps to reproduce the behavior.

When I use the ensemble model and call it through the /v2/models/ensemble/generate interface, the request id I passed cannot be printed in the built-in verbose log of trtion.

Expected behavior A clear and concise description of what you expected to happen.

Triton's built-in verbose log can print request id

@yinggeh

yinggeh commented 6 months ago

Hi @Missmiaom . Thanks for waiting. I have opened the ticket DLIS-6456 for our engineers to investigate.

dafu-wu commented 3 months ago

@yinggeh This feature already done?

yinggeh commented 3 months ago

@dafu-wu PR https://github.com/triton-inference-server/server/pull/7392 is currently under review. Thanks for your patience.

yinggeh commented 2 months ago

PR https://github.com/triton-inference-server/server/pull/7392 merged. Reopen for further questions.

dafu-wu commented 1 month ago

@yinggeh Thanks for your updated, I use the latest trition server image 24.07 to build the tensorrtllm backend, but the log still has some problem: Screenshot2024_08_15_145514

request body: '{"id": "11111", "sequence_id": "456", "text_input": "What is machine learning?", "max_tokens": 20, "bad_words": "", "stop_words": "", "pad_id": 2, "end_id": 2, "top_p":1, "top_k":1, "temperature":0.7}'

Is there something wrong with the way I use it? @shreyas-samsung can you help us explain it?

yinggeh commented 1 month ago

@dafu-wu Looks like PR https://github.com/triton-inference-server/server/pull/7392 missed the deadline of 24.07 release. Could you try building the latest or wait for the 24.08 image?

dafu-wu commented 1 month ago

@yinggeh Do you know which dockerfile is the dockerfile of the official image?

yinggeh commented 1 month ago

@dafu-wu Thanks for your patience. Are you referring to Dockerfile.* in server repo?