Question about the MACs

uanu2002 / JSQ

[ICML 2024] JSQ: Compressing Large Language Models by Joint Sparsification and Quantization

MIT License

148 stars 5 forks source link

Question about the MACs #1

Closed digbangbang closed 1 month ago

digbangbang commented 1 month ago

The experimental results in the paper are set for MACs. In my opinion, quantization should not affect MACs. MACs refers to the number of operations of a multiplication and addition. How can smoothquant reduce MACs? Also, many quantizations in the paper reduce MACs.

uanu2002 commented 1 month ago

Due to the difference between quantization and sparsification, and the computation reduction of different algorithms for quantitative comparison, we use the same bit width as the benchmark to calculate MACs(eg. 32-bit floating point operation), so that the computation improvement of the quantization method can also be quantized and compared fairly..

digbangbang commented 1 month ago

Thanks for ur reply!

My doubts are about the two experiments of Smoothquant and Omniquant. According to my understanding, quantizing activations and weights will not affect MACs, that is, the amount of calculation will not be reduced, because it will only affect the usage of cuda memory and the inference speed. So how is the 3.24T here calculated, especially the Smoothquant and Omniquant?

Looking forward to ur reply, since I have done similar work before, I also wanted to compare quantization and sparsification on MACs. However, I finally found that sparsification can reduce MACs, such as structured sparsity, but quantization does not seem to have any effect.

uanu2002 commented 1 month ago

Thank you for your interest in our work.

In model acceleration, in order to be able to quantify quantization and compare it with sparsification, it is necessary to clarify the bit width of the floating-point operation that MACs relies on (for example, 8-bit floating-point operation is 4 times faster than 32-bit floating-point operation). Therefore, we take 32-bit floating-point operation as the reference value for MACs calculation, and on this basis calculate to determine the impact of sparsification and quantization algorithm on computation reduction. I hope this answers your questions.

Best

digbangbang commented 1 month ago

Thanks ur detailed reply, I understand what you mean. 😊