Is your feature request related to a problem? Please describe.
I am using a model with dynamic batching, which necessitates warming up for each supported batch size. This process consumes both time and VRAM.
My objective is to significantly enhance RPS without overloading these resources. To accomplish this, I plan to warm up the model for batch sizes with varying increments (e.g., 1, 2, 3, 4, 8, 16, 32). However, this approach cannot be implemented due to the dynamic batching logic described in model_configuration.md.
Example Problem:
Consider a scenario with 12 requests in the queue.
Currently, these requests would be combined into a single 12 batch. However, the objective is to split them into two batches with sizes of 4 and 8.
Describe the solution you'd like
Add a flag to make preferred_batch_size mandatory or create a new variable to strictly control dynamic batch sizes.
Describe alternatives you've considered
I haven't come up with any alternatives.
Is your feature request related to a problem? Please describe. I am using a model with dynamic batching, which necessitates warming up for each supported batch size. This process consumes both time and VRAM. My objective is to significantly enhance RPS without overloading these resources. To accomplish this, I plan to warm up the model for batch sizes with varying increments (e.g., 1, 2, 3, 4, 8, 16, 32). However, this approach cannot be implemented due to the dynamic batching logic described in model_configuration.md.
Example Problem: Consider a scenario with 12 requests in the queue. Currently, these requests would be combined into a single 12 batch. However, the objective is to split them into two batches with sizes of 4 and 8.
Describe the solution you'd like Add a flag to make preferred_batch_size mandatory or create a new variable to strictly control dynamic batch sizes.
Describe alternatives you've considered I haven't come up with any alternatives.