Open wxsms opened 3 months ago
tensorrt_llm==0.11.0.dev2024061800
@ncomly-nvidia
examples
deploy a model with beam_width > 1 and trtllm backend, request the BLS model with geneate_stream endpoint and stream: true
beam_width > 1
stream: true
the accumulate_tokens should be able to True
accumulate_tokens
True
error thrown: Accumulation of tokens is only implemented for beam width = 1
Accumulation of tokens is only implemented for beam width = 1
Maybe all we need to do is enhance the BLS script I think?
System Info
tensorrt_llm==0.11.0.dev2024061800
Who can help?
@ncomly-nvidia
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
deploy a model with
beam_width > 1
and trtllm backend, request the BLS model with geneate_stream endpoint andstream: true
Expected behavior
the
accumulate_tokens
should be able toTrue
actual behavior
error thrown:
Accumulation of tokens is only implemented for beam width = 1
additional notes
Maybe all we need to do is enhance the BLS script I think?