punica-ai / punica

Serving multiple LoRA finetuned LLM as one
https://arxiv.org/abs/2310.18547
Apache License 2.0
883 stars 40 forks source link

Inquiry on cuda memory across processes #39

Open mozizhao opened 5 months ago

mozizhao commented 5 months ago

Hi,

Congratulations on the great work you have done! I am very interested in your work. Specifically, I want to know how you allow multiple serving processes to share the same Cuda memory spaces (for the frozen parameters in the LoRA models).

Could you please point out the code? I want to study your implementation. Thanks!

BR//Zizhao