triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend
Apache License 2.0
714 stars 108 forks source link

Result is out of order when using http stream mode #614

Open liu21yd opened 1 month ago

liu21yd commented 1 month ago
  1. I built the image from source , using the TensorRT-LLM and tensorrtllm_backend main brand at September 25th。
  2. I built qwen2.5 14B and start the triton server successfully.

I get http result like this: roNhU7fK79 but I get a out of order result by using generator_stream url: img_v3_02fh_0b3f5b1c-569a-4476-857a-7a063fb8ecfg

Who can help me?

intellimouseftw commented 1 month ago

+1 I am facing the same issue. Are you using >1 for postprocessing_instance_count in your config.pbtxt?

matichon-vultureprime commented 1 month ago

+1 I am using version 0.13.0 rel. The issue occurs when I set postprocessing_instance_count > 1. Even if I set postprocessing_instance_count > 1 and match it with preprocessing_instance_count, it still doesn't resolve the issue.

In version 0.12.0 rel, it worked well.

intellimouseftw commented 1 month ago

@matichon-vultureprime I just tried version 0.12.0, it worked well on there with postprocessing_instance_count > 1 too.

Seems like a bug introduced in 0.13.0

liu21yd commented 1 month ago

+1 I am facing the same issue. Are you using >1 for postprocessing_instance_count in your config.pbtxt?

Yes. When I use the v0.11.0.dev2024051400 version's config, it works. It seems that some new features caused this bug.