Open christinaburge opened 1 month ago
Hi @christinaburge
May we know why you need this implementation, any background?
To use your-defined bias data type in oneDNN. you can change bias memory primitives & memory descriptor in cnn_inference_int8.cpp as a reference. // auto user_bias_memory = memory({{conv_bias_tz}, dt::f32, tag::x}, eng); --> auto user_bias_memory = memory({{conv_bias_tz}, dt::s8, tag::x}, eng);
// auto conv_bias_md = memory::desc({conv_bias_tz}, dt::f32, tag::any); --> auto conv_bias_md = memory::desc({conv_bias_tz}, dt::s8, tag::any);
DNNL_VERBOSE output should be: ~/project/oneDNN/examples$ ONEDNNVERBOSE=1 ./a.out | grep bia onednn_verbose,primitive,exec,cpu,convolution,brg_conv_fwd:avx2_vnni,forward_training,src_u8:a:blocked:acdb::f0 wei_s8:a:blocked:AcdB24a4b::f0 bia_s8::a:blocked:a::f0 dst_u8:a:blocked:acdb::f0,attr-scales:src0:0:f32+dst:0:f32+wei:0:f32 attr-post-ops:eltwise_relu,alg:convolution_direct,mb8_ic256oc384_ih13oh13kh3sh1dh0ph1_iw13ow13kw3sw1dw0pw1,14.845
@christinaburge Though your observation is correct and one can add such support the way you suggested, it is not really helping because of two reasons: 1) The library simply un-doing what the user just did - in particular, un-scale just scaled bias on their end. Performance-wise there's no point doing twice the work for nothing. 2) Programming model says the bias is applied after applying scales, so this would contradict with a definition - quantized bias is applied before scaling but non-quantized right after.
To keep the solution clean, bias should be passed non-quantized. We haven't restricted that for some reason, but we might consider doing that in the next major version to avoid such situations. Hope it helps.
You can find detailed explanation of current quantization scheme in oneDNN in "Quantization and scaling" RFC.
Hi! Following on from the issue https://github.com/oneapi-src/oneDNN/issues/1864, I have been doing some further investigation and it looks like it's relatively straightforward to add quantized bias by making these super-hacky changes to
pytorch/third_party/ideep/mkl-dnn/src/cpu/gemm_x8s8s32x_convolution_utils.cpp
Does that sound sensible? Or is there a better way to proceed?