Closed RonanKMcGovern closed 2 months ago
Hi @RonanKMcGovern thanks for you interest of our work. LoftQ changes the backbone for better LoRA fine-tuning.
Specifically, LoftQ optimizes the following objective: $$\underset{Q, A, B}{\mathrm{min}} ||W - Q - AB^{\top}||_{\mathrm{F}}^2.$$
$W$ is the high-precision weight, $Q$ is the quantized weight (it doesn't necessarily have to be quantized directly from $W$), and $A,B$ are the adapters. We solve the above optimization in an alternating way, so $Q$ is not $q(W)$ any more. That's why we need to upload a new backbone. And since bitsandbytes
(the quantization backend we use) doesn't support to upload quantized weight, we have to upload the quantization equivalent high-precision weight to Huggingface. Users need to download it and quantize it before starting to use.
Thanks, makes sense.
I'm wondering why not just push the adapter model alone? That would seem sufficient?