Open tonylek opened 2 months ago
What do you mean for block reuse?
This is the exception I got when trying to deploy the model on triton. It happens when I put the kv_cahce_reuse as True in the config.pbtxt of the model
It is not in our roadmap now. If you are interested in this feature, you could propose a feature request and we will consider it in our roadmap.
Is there a plan to add support for block reuse in beam search? Could be very helpful. When I try to use it I get the exception of: Block reuse is currently not supported with beam width > 1