Closed ccccj closed 10 months ago
Hi @ccccj , thank you for the question. We found that LPLR performs best (with respect to its baselines) at low bit budgets. Correspondingly, we recommend experimenting with 0 < b1, b2 <=8 and cr ~ 1. For simplicity, we usually set b1=b2, however there are interesting results at different settings as well depending on the model and dataset.
The CR (compression ratio) represents the difference in parameters between Naive Quantization and our low rank variants. For an apples to apples comparison, we recommend keeping it at 1. If you wish to test out LPLR at higher compression levels (at the cost of task performance), feel free to reduce it. A value greater than 1 implies over parametrization (wrt naive quant) and should yield an increase in performance.
Hello and thank you for sharing. If I want to compress the model parameters of llama, what would you recommend to choose for the values of b1, b2 and cr?