triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend
Apache License 2.0
588 stars 81 forks source link

Block reuse is currently not supported with beam width > 1 #411

Open tonylek opened 2 months ago

tonylek commented 2 months ago

Is there a plan to add support for block reuse in beam search? Could be very helpful. When I try to use it I get the exception of: Block reuse is currently not supported with beam width > 1

byshiue commented 2 months ago

What do you mean for block reuse?

tonylek commented 2 months ago

This is the exception I got when trying to deploy the model on triton. It happens when I put the kv_cahce_reuse as True in the config.pbtxt of the model

byshiue commented 2 months ago

It is not in our roadmap now. If you are interested in this feature, you could propose a feature request and we will consider it in our roadmap.