punica-ai / punica

Serving multiple LoRA finetuned LLM as one
https://arxiv.org/abs/2310.18547
Apache License 2.0
883 stars 40 forks source link

Support Continuous Batching? #45

Closed Wisley1998 closed 3 months ago

Wisley1998 commented 3 months ago

Hi. I've read Punica paper. It says that "We put the batching dimension on the outmost to enable continuous batching" in Sec 5.4. Could you please tell me which code of Punica achieves continuous batching? I didn't find it in this repo. Thanks a lot!

abcdabcd987 commented 3 months ago

I plan to extract the logic to a seperate file. In the meanwhile, you can refer to the following to examples: