punica-ai / punica

Serving multiple LoRA finetuned LLM as one
https://arxiv.org/abs/2310.18547
Apache License 2.0
883 stars 40 forks source link

Question about the work #6

Closed Gorluxor closed 7 months ago

Gorluxor commented 7 months ago

Nice work. Your work focuses on LLM inference and optimizing the inference speed. Do you support backprop? If not, how difficult would it be to allow backprop to work with the custom Cuda kernel?

abcdabcd987 commented 7 months ago

Thanks!

Currently we don't support backprop. And backprop would require significant amount of work. And personally speaking, I don't see too much performance gain in finetuning multiple models in one batch.

I recommend using PEFT or other libraries to finetune your model. Then you can use punica to serve all of them. I'm working on a finetuning recipe and also a script to convert PEFT weights to our format. I'll let you know once it's out :)

Gorluxor commented 7 months ago

Yeah guess as much from the title alone, but wanted to verify. Once again good work, and thanks for the fast reply.