triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend
Apache License 2.0
613 stars 86 forks source link

Thread [0] had error: in ensemble 'ensemble', Encountered error for requestId 498689237: Cannot process new request: Streaming mode is only supported with beam width of 1. #140

Open Juelianqvq opened 8 months ago

Juelianqvq commented 8 months ago

Try to use perf_analyzer as follows deploying LLaMA2-13B with triton:

python scripts/launch_triton_server.py --world_size 2 --model_repo triton_model_repo perf_analyzer -m ensemble -i grpc --shape "bad_words:1" --shape "max_tokens:1" --shape "stop_words:1" --shape "text_input:1" --streaming

However, I'm encountered with an error which implies the beam_width is not set correctly. Also, I'm a new beginner and curious about the dimension with --shape. Can you give me some suggestions?

byshiue commented 8 months ago

Can you share

Juelianqvq commented 8 months ago

Can you share

  • the script to build engine
  • the config.pbtxt of your backend settings

I enabled the options with " --dtype float16 --remove_input_padding --use_gpt_attention_plugin float16 --enable_context_fmha --use_gemm_plugin float16 --use_inflight_batching --world_size 2 --tp_size 2 --max_output_len 1024"

and here is the pbtxts: pbtxt.zip

byshiue commented 8 months ago

I don't see you setup the beam_width when you build the engine. Can you try adding --max_beam_width 4?

Juelianqvq commented 8 months ago

I don't see you setup the beam_width when you build the engine. Can you try adding --max_beam_width 4?

I've added the option and the problem exists. [TensorRT-LLM][ERROR] Encountered error for requestId 1380228878: Cannot process new request: Streaming mode is only supported with beam width of 1. [TensorRT-LLM][ERROR] Cannot process new request: Streaming mode is only supported with beam width of 1.

byshiue commented 8 months ago

Thanks. I find the limitation in batch manager that I missed. Could you modify this issue or open another issue to require this feature?

Juelianqvq commented 8 months ago

Thanks. I find the limitation in batch manager that I missed. Could you modify this issue or open another issue to require this feature?

OK

MuyeMikeZhang commented 2 months ago

When I use perf_analyzer to test the performance, I meet the problem name "Thread [0] had error: Cannot send stop request without specifying a request_id". Do you know how to fix it ?