punica-ai / punica

Serving multiple LoRA finetuned LLM as one
https://arxiv.org/abs/2310.18547
Apache License 2.0
883 stars 40 forks source link

About scheduler #26

Open luzai opened 7 months ago

luzai commented 7 months ago

Thank you for your great work! May I ask about some details on the scheduler?

  1. In paper, it is mentioned that "To minimize latency penalty, we limit the prefill batch size to 1 for each batch." So if multiple requests are at prefill stage, they will either be scheduled to different Runners or be in the first-arrive-first service queue. Is this understanding correct? By the way, May I know whether this scheduling code (for section 5.1 Scheduling new request) is released?
  2. In figure 2, May I know the difference between runner and LLMs under a runner?

Looking forward to hearing from you~

luciferlinx101 commented 6 months ago

Yeah even I am also looking for the same!

jjjjohnson commented 6 months ago

Look like there is no implementation for scheculer in this repo?