yxli2123 / LoftQ

MIT License
180 stars 15 forks source link

Questions about lora merge. #8

Closed StiphyJay closed 7 months ago

StiphyJay commented 7 months ago

Thx for your great job!

After reading your paper and code, I have a question: How do you merge LoRA weights to quantized LLM for inference?

Looking forward to your reply!

Regards!

yxli2123 commented 7 months ago

Thanks for your interest of your work. We don't merger the LoRA adapters into the quantized LLM. When doing inference, we compute $QX$ and $ABX$ separately and add them: $Y = QA + ABX$. We don't do $Y = (Q + AB) X$.

StiphyJay commented 7 months ago

Thanks for your interest of your work. We don't merger the LoRA adapters into the quantized LLM. When doing inference, we compute QX and ABX separately and add them: Y=QA+ABX. We don't do Y=(Q+AB)X.

Therefore, during inference, the AB matrix is Float16, while the Q matrix is in INT8/INT4, and X is in Float16, right?

yxli2123 commented 7 months ago

Yes, you are right.

StiphyJay commented 7 months ago

Thanks for your help.