[ContinuousBatching] ContinuousBatchingScheduler Implementation

bfineran commented 10 months ago

Uses the utils in #1373 and #1374 to implement a scheduler for continuous batching. The ContinuousBatchingScheduler tracks various EngineOperators and manages their input queues with ContinuousBatchingQueues. The scheduler also runs multiple ContinuousBatchingExecutorThreads in parallel that consume these queues and actually run the multi-batch engine and return the correct futures from the scheduler submit.

next steps include:

integration with KV Cache engine mode for text gen
implementation of a singleton pattern so that the scheduler can easily be shared across pipelines
text gen pipeline integration
more involved heuristic for the select_fn to determine execution priority
server integration
heavier testing
README

test_plan: simple single execution unit test included, further tests should test multiple engines/operators/batch sizes with sufficient load to trigger multibatch execution - note that unit tests for multibatch are handled with the helpers

dsikka commented 10 months ago

One more question: in terms of next steps, you had written down: integration with KV Cache engine mode for text gen Any reason this can't work with the NLEngineOperator as is currently? The NLEngineOperator inherits from the EngineOperator

@bfineran

bfineran commented 10 months ago

One more question: in terms of next steps, you had written down: integration with KV Cache engine mode for text gen Any reason this can't work with the NLEngineOperator as is currently? The NLEngineOperator inherits from the EngineOperator

@bfineran

yeah a few things here:

the schemas need to implement the split/join since they don't inherit
I think we'll need to update the way the engine kwargs are passed so the shared create_engine function sets the right internal/external kv cache mode
the run function needs to get updated to accept an engine to be swapped out like we do in EngineOperator

neuralmagic / deepsparse

[ContinuousBatching] ContinuousBatchingScheduler Implementation #1375