Congratulations on the great work you have done! I am very interested in your work. Specifically, I want to know how you allow multiple serving processes to share the same Cuda memory spaces (for the frozen parameters in the LoRA models).
Could you please point out the code? I want to study your implementation. Thanks!
Hi,
Congratulations on the great work you have done! I am very interested in your work. Specifically, I want to know how you allow multiple serving processes to share the same Cuda memory spaces (for the frozen parameters in the LoRA models).
Could you please point out the code? I want to study your implementation. Thanks!
BR//Zizhao