Open shiqingzhangCSU opened 11 months ago
What's batch size do you use?
I test batch size = 1.
For small batch size, int 8 weight only is expected to be faster than SmoothQuant. So, your results make sense.
You can try larger batch size to check the performance of SmoothQuant.
Thank you for your response! I will try more tests.
I tested two quantization methods on a 3B model: w8a8 smooth quant and int8 weight-only quant. The following is the efficiency of different optimization methods .I'm a little confused, Is int8 weight only faster than smooth quant? Or maybe I have some bug on my code?