Closed bfineran closed 10 months ago
One more question: in terms of next steps, you had written down: integration with KV Cache engine mode for text gen
Any reason this can't work with the NLEngineOperator
as is currently? The NLEngineOperator
inherits from the EngineOperator
@bfineran
One more question: in terms of next steps, you had written down:
integration with KV Cache engine mode for text gen
Any reason this can't work with theNLEngineOperator
as is currently? TheNLEngineOperator
inherits from theEngineOperator
@bfineran
yeah a few things here:
create_engine
function sets the right internal/external kv cache modeEngineOperator
Uses the utils in #1373 and #1374 to implement a scheduler for continuous batching. The
ContinuousBatchingScheduler
tracks variousEngineOperators
and manages their input queues withContinuousBatchingQueues
. The scheduler also runs multipleContinuousBatchingExecutorThreads
in parallel that consume these queues and actually run the multi-batch engine and return the correct futures from the schedulersubmit
.next steps include:
select_fn
to determine execution prioritytest_plan: simple single execution unit test included, further tests should test multiple engines/operators/batch sizes with sufficient load to trigger multibatch execution - note that unit tests for multibatch are handled with the helpers