Why are the full models, and not just adapters, pushed to hub?

RonanKMcGovern commented 2 months ago

I'm wondering why not just push the adapter model alone? That would seem sufficient?

yxli2123 commented 2 months ago

Hi @RonanKMcGovern thanks for you interest of our work. LoftQ changes the backbone for better LoRA fine-tuning.

Specifically, LoftQ optimizes the following objective: $$\underset{Q, A, B}{\mathrm{min}} ||W - Q - AB^{\top}||_{\mathrm{F}}^2.$$

$W$ is the high-precision weight, $Q$ is the quantized weight (it doesn't necessarily have to be quantized directly from $W$), and $A,B$ are the adapters. We solve the above optimization in an alternating way, so $Q$ is not $q(W)$ any more. That's why we need to upload a new backbone. And since bitsandbytes (the quantization backend we use) doesn't support to upload quantized weight, we have to upload the quantization equivalent high-precision weight to Huggingface. Users need to download it and quantize it before starting to use.

RonanKMcGovern commented 2 months ago

Thanks, makes sense.

yxli2123 / LoftQ

Why are the full models, and not just adapters, pushed to hub? #23