triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8k stars 1.44k forks source link

Unfixed bugs:issue/5783, Inaccurate request handling when configuring queue policy #6796

Open eeeeeunjung opened 7 months ago

eeeeeunjung commented 7 months ago

Description i want to use the model's queue policy(max queue length and timeout),but i found triton does not handle requests in the accurate too,and i found this issue https://github.com/triton-inference-server/server/issues/5783, i guess this is the same question. When multiple requests are sent continuously at short intervals, the maximum queue length and queue waiting timeout sometimes do not take effect as expected

Triton Information triton 23.10 Ubuntu 22.04.3

To Reproduce easy way to reproduce: image i add sleep time in the testcase,and it failed.

Expected behavior add sleep time to test_max_queue_size and it success.

nv-kmcgill53 commented 7 months ago

Hi @eeeeeunjung, can you confirm this behavior also occurs in the 23.12 release as well?

CC; @tanmayv25 for vis

eeeeeunjung commented 7 months ago

Hi @eeeeeunjung, can you confirm this behavior also occurs in the 23.12 release as well?

CC; @tanmayv25 for vis

oh sorry, i have no machine to test it now.

zengqingfu1442 commented 6 months ago

Is this bug resolved in the 23.12 release? thanks.

priyankat99 commented 2 months ago

any updates?

sboudouk commented 1 week ago

Any update ? Not being able to rely on this is a big issue.

nv-kmcgill53 commented 1 week ago

CC: @rmccorm4