pilancilab / matrix-compressor

Implementation of LPLR algorithm for matrix compression
21 stars 1 forks source link

What would you recommend to choose for the values of b1, b2 and cr? #3

Closed ccccj closed 10 months ago

ccccj commented 10 months ago

Hello and thank you for sharing. If I want to compress the model parameters of llama, what would you recommend to choose for the values of b1, b2 and cr?

VarunSrivastavaIITD commented 10 months ago

Hi @ccccj , thank you for the question. We found that LPLR performs best (with respect to its baselines) at low bit budgets. Correspondingly, we recommend experimenting with 0 < b1, b2 <=8 and cr ~ 1. For simplicity, we usually set b1=b2, however there are interesting results at different settings as well depending on the model and dataset.

The CR (compression ratio) represents the difference in parameters between Naive Quantization and our low rank variants. For an apples to apples comparison, we recommend keeping it at 1. If you wish to test out LPLR at higher compression levels (at the cost of task performance), feel free to reduce it. A value greater than 1 implies over parametrization (wrt naive quant) and should yield an increase in performance.