Integrates the ContinuousBatchingScheduler with the TextGeneration Pipeline
This requires updating the NLEngineOperator such that batches greater than size 1 can be handled. Specifically includes updating the input/output schema to include split/join methods required by the ContinuousBatchingScheduler and update the run method such that inputs can be handled by any scheduler
Refactors the join/split functionality such that instead of launching individual threads for each SPLIT node, poll to identify the operator that is currently finished running and schedules the next operator in its split route, until all the operators within each sub graph (i.e. split route) have finished running. This is done through the help of a dataclass SplitRoute
Testing
Tested with ORT and Deepsparse Engine + External kv_Cache. For internal kv_cache, the normal scheduler is used.
Summary
ContinuousBatchingScheduler
with theTextGeneration
PipelineNLEngineOperator
such that batches greater than size 1 can be handled. Specifically includes updating the input/output schema to include split/join methods required by theContinuousBatchingScheduler
and update therun
method such that inputs can be handled by any schedulerSPLIT
node, poll to identify the operator that is currently finished running and schedules the next operator in its split route, until all the operators within each sub graph (i.e. split route) have finished running. This is done through the help of a dataclassSplitRoute
Testing