triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.4k stars 1.49k forks source link

fix: usage of ReadDataFromJson in array tensors #7624

Closed v-shobhit closed 1 month ago

v-shobhit commented 2 months ago

What does the PR do?

The generate and generate_stream endpoints did not seem to work when directly querying TRTLLM backend with input tokens. This is because the HTTPAPIServer::GenerateRequestClass::ExactMappingInput does not send the correct size of an array input to ReadDataFromJson.

This PR also fixes https://github.com/triton-inference-server/tensorrtllm_backend/issues/369

Checklist

Commit Type:

Check the conventional commit type box here and add the label to the github PR.

Related PRs:

Where should the reviewer start?

Test plan:

Added a new test case to L0_http job. Internal CI pipeline id: 18800660

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

pskiran1 commented 1 month ago

LGTM, just to confirm that the added test case will fail without this http_server.cc change?

@GuanLuo , thanks for the note. The test case passed with both old and new code (with http_server.cc change). I had to undo the commit 229e5e85eeef481916d8bf67f24bcf9dfcb68b25, I think this bug is not happening for the String data type. Could you please review the latest code(CI) and approve it?

Now the test case fails on 24.09 with the old code and passes with the new code (including the http_server.cc change).