Closed ThisisBillhe closed 7 months ago
S is the value and index of corresponding outlier values. You can not use low-rank approximation to approximate that. Also, we show in paper that outlier extraction does not work well for activation. If you look at our paper carefully, you will find that at a very high compression ratio, low-rank approximation is the best compared with sparsity and quantization. Also, quantization error has a decreasing distribution of egin value, which shows that quantization error can be approximated well by using low rank approximation.
Hi Hao Kang, it is a pleasure to see alumni from ZJU here.
Hi, nice to meet you, too!
Hi there! Thanks for your excellent work and the open-sourced code.
In Outlier-reduced quantization section of your paper, you mentioned that "such a sparse matrix results in the remaining cache size equivalent to that of 8-bit quantization because of its two index vectors and one value vector in full precision". If I understand correctly, the storage of S can be a heavy burden because there is a large portion of outliers. In this case, why do you store S and further introduce a low-rank matrix L to approximate the residual R=X-(D+S)? Using a low-rank matrix to approximate the S makes more sense to me.
I understand that your method (X=D+L+S) can achieve better performance. It's just the motivation of Low-rank approximation section is a little confusing.