Open badhri-intel opened 2 months ago
In the paper, it says using w4a4 quantization can theoretically produce 8x inference speedup. Could you please confirm this for SD or what sort of speedup (inference latency) you observed? Thanks
In the paper, it says using w4a4 quantization can theoretically produce 8x inference speedup. Could you please confirm this for SD or what sort of speedup (inference latency) you observed? Thanks