About scheduler - Githubissues

luzai commented 7 months ago

Thank you for your great work! May I ask about some details on the scheduler?

In paper, it is mentioned that "To minimize latency penalty, we limit the prefill batch size to 1 for each batch." So if multiple requests are at prefill stage, they will either be scheduled to different Runners or be in the first-arrive-first service queue. Is this understanding correct? By the way, May I know whether this scheduling code (for section 5.1 Scheduling new request) is released?
In figure 2, May I know the difference between runner and LLMs under a runner?

Looking forward to hearing from you~

luciferlinx101 commented 6 months ago

Yeah even I am also looking for the same!

jjjjohnson commented 6 months ago

Look like there is no implementation for scheculer in this repo?

punica-ai / punica