Open DD-DuDa opened 2 months ago
Hi @DD-DuDa , thank you very much for your interests in QServe. We real-measured the dequant overheads of the above kernels. We compared the actual throughputs of GEMM kernels with dequantization and kernels in which dequantization ops are skipped. The difference of throughputs between the two version of kernels is regarded as dequant overhead.
Got it! Thank you for your response!
@ys-2020 In the formula(5) in your paper, why is the per group scale uint8? Why uint4-uint4 multiplied by uint8 can still be sint8? Is that a typo? This is quite confusing.(In my understanding, the per group scale should be also 4bit to generate a sint8-w)
Thanks for your great work. I want to learn how to calculate the dequantization overhead, like in Figure 18, since the dequantization process is within a single kernel.