Closed zhewang1-intc closed 3 months ago
feature or bug fix or documentation or others: feature API changed or not: add a new kernel
int4 dequantize kernel with very high bandwidth utilization.
MTL: kernel bandwidth: ~85GB/s reported by VTune, hardware maximum bandwidth: ~85GB/s reported by clpeak, nearly 100% utilization;
ARC 770M: at least 90%+ vram bandwidth utilization.
the expected behavior that triggered by this PR
how to reproduce the test (including hardware information)
any library dependency introduced or removed
Type of Change
feature or bug fix or documentation or others: feature API changed or not: add a new kernel
Description
int4 dequantize kernel with very high bandwidth utilization.
MTL: kernel bandwidth: ~85GB/s reported by VTune, hardware maximum bandwidth: ~85GB/s reported by clpeak, nearly 100% utilization;
ARC 770M: at least 90%+ vram bandwidth utilization.
Expected Behavior & Potential Risk
the expected behavior that triggered by this PR
How has this PR been tested?
how to reproduce the test (including hardware information)
Dependency Change?
any library dependency introduced or removed