feat: support iteration level scheduling - Githubissues

mosecorg / mosec

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

https://mosecorg.github.io/mosec/

Apache License 2.0

787 stars 60 forks source link

feat: support iteration level scheduling #383

Open kemingy opened 1 year ago

kemingy commented 1 year ago

          Also https://www.usenix.org/conference/osdi22/presentation/yu

Originally posted by @VoVAllen in https://github.com/mosecorg/mosec/issues/382#issuecomment-1588622255

kemingy commented 1 year ago

Although Orca coupled the scheduler and execution engine, it still has something we can learn from.

For GPT-like models, they can benefit from iteration-level scheduling in the following part:

status request can return to the client before other requests are finished in this batch
new requests can enter the batch without waiting for all the requests in the previous batch to have been finished

refer to:

Orca: https://www.usenix.org/conference/osdi22/presentation/yu
BatchMaker: https://cs.nyu.edu/~lingfan/resources/batchmaker-eurosys18.pptx
text-generation-inference continuous batching: https://github.com/huggingface/text-generation-inference/blob/main/router/README.md#continuous-batching