neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs
https://neuralmagic.com/deepsparse/
Other
2.98k stars 172 forks source link

[Continuous Batching] Queue Implementation to support batching grouping and prioritization #1373

Closed bfineran closed 10 months ago

bfineran commented 10 months ago

This PR adds a helper class to manage engine operator requests that may come in for various engines held by the same continuous batching scheduler

the main idea is that as EngineOperator requests come in, they will be queued in separate Queues key'd by their EngineOperator. The scheduler will then use ContinuousBatchingQueues.pop_batch() to select the next most important batch for running.

Right now, the heuristic for the next batch is 1) the queue with longest wait time if it is over 100ms 2) the queue that can fill the largest batch size. For future work - the scheduler can implement a select_fn with more involved heuristics that take into account other worker threads if needed.

test_plan: Unit tests included for basic functionality of ContinuousBatchingQueue and ContinuousBatchingQueues