How can I create a matmul primitive with A16W8 (active 16bits, weight 8bits) configuration?

Teaonly commented 6 months ago

The configure for creating primitive_desc of Matrix Multiplication

''' memory::desc a_md({M, K}, memory::data_type::f16, {K, 1}); // M x K layout memory::desc b_md({K, N}, memory::data_type::s8, {N, 1}); // K x M layout memory::desc c_md({M, N}, memory::data_type::f16, {N, 1}); // M x N layout primitive_attr attr; attr.set_scales_mask(DNNL_ARG_WEIGHTS, 1); // channel based quantized int8 // Create a MatMul primitive descriptor auto pd = matmul::primitive_desc(eng, a_md, b_md, c_md, attr); '''

This code will cause a unimplemented exception: "Message: could not create a primitive descriptor for a matmul primitive"

How can i create matmul with A16W8 ?

Teaonly commented 6 months ago

$ ./examples/tutorials-matmul-inference-int8-matmul-cpp gpu onednn_verbose,info,oneDNN v3.6.0 (commit 95c00edd0afd50e9cff045b9838bfc77b2a82b5a) onednn_verbose,info,cpu,runtime:OpenMP,nthr:22 onednn_verbose,info,cpu,isa:Intel AVX2 with Intel DL Boost onednn_verbose,info,gpu,runtime:OpenCL onednn_verbose,info,gpu,engine,0,name:Intel(R) Arc(TM) Graphics,driver_version:24.9.28717,binary_kernels:enabled onednn_verbose,primitive,info,template:operation,engine,primitive,implementation,prop_kind,memory_descriptors,attributes,auxiliary,problem_desc,exec_time onednn_verbose,primitive,create:dispatch,gemm,gpu,gemm,jit:xe_hp:gemm:any,undef,src_a_f16::blocked:ab::f0 src_b_s8::blocked:ab::f0 dst_f16::blocked:ab::f0,attr-scales:wei:2:f32 attr-post-ops:eltwise_relu,,x96:96x1000,skipping or dispatching to another implementation,src/gpu/intel/jit/gemm/xe_hp_systolic_gemm.cpp:75 onednn_verbose,primitive,create:dispatch,gemm,gpu,gemm,ocl:gemm_with_po:any,undef,src_a_f16::blocked:ab::f0 src_b_s8::blocked:ab::f0 dst_f16::blocked:ab::f0,attr-scales:wei:2:f32 attr-post-ops:eltwise_relu,,x96:96x1000,runtime dimension is not supported,src/gpu/intel/ocl/gemm/gemm_with_post_ops.cpp:42 onednn_verbose,primitive,create:dispatch,gemm,gpu,gemm,jit:gemm:any,undef,src_a_f16::blocked:ab::f0 src_b_s8::blocked:ab::f0 dst_f16::blocked:ab::f0,attr-scales:wei:2:f32 attr-post-ops:eltwise_relu,,x96:96x1000,unsupported datatype,src/gpu/intel/jit/gemm/gen_gemm.hpp:124 onednn_verbose,primitive,create:dispatch,gemm,gpu,gemm,ocl:ref:any,undef,src_a_f16::blocked:ab::f0 src_b_s8::blocked:ab::f0 dst_f16::blocked:ab::f0,attr-scales:wei:2:f32 attr-post-ops:eltwise_relu,,x96:96x1000,unsupported attribute,src/gpu/intel/ocl/gemm/ref_gemm.hpp:81 onednn_verbose,primitive,create:dispatch,matmul,failed to create nested primitive gemm,src/gpu/intel/ocl/gemm_matmul.hpp:266 onednn_verbose,primitive,create:dispatch,matmul,gpu,matmul,ocl:ref:any,undef,src_f16::blocked:ab::f0 wei_s8::blocked:ab::f0 dst_f16::blocked:ab::f0,attr-scales:wei:2:f32 attr-post-ops:eltwise_relu,runtime_dims_masks:1:0,*x96:96x1000,unsupported datatype combination,src/gpu/intel/ocl/ref_matmul.hpp:70 oneDNN error caught: Status: unimplemented Message: could not create a primitive descriptor for a matmul primitive Example failed on GPU.

igorsafo commented 6 months ago

Hi @Teaonly , here is an example: https://github.com/oneapi-src/oneDNN/blob/main/examples/tutorials/matmul/weights_decompression_matmul.cpp (or https://oneapi-src.github.io/oneDNN/page_weights_decompression_matmul_cpp.html#doxid-weights-decompression-matmul-cpp) The fpmath_mode should be set to force int8 operation to work with floating point computations.

For more information please review a discussion on the same topic: #1893

oneapi-src / oneDNN

How can I create a matmul primitive with A16W8 (active 16bits, weight 8bits) configuration? #1895