oneapi-src / oneDNN

oneAPI Deep Neural Network Library (oneDNN)
https://uxlfoundation.org
Apache License 2.0
3.6k stars 991 forks source link

Use of different MatMul APIs for NLP workloads #1658

Closed avinashcpandey closed 1 year ago

avinashcpandey commented 1 year ago

I am trying to simulate few NLP workloads and evaluating performance on Icelake. As NLPs are dominated by MatMuls, I am exploring different matmul API in OneDNN for different seqlen and batch size. I see, there are several offering in OneDNN. My questions are on similar lines.

  1. DNNL direct API call for matrix multiplication (Inner product & MatMul) Question: For reuse weights tensors during inference, how do I call these two? Any example will help.

  2. BLAS like call dnnl::sgemm (this I am more interested in, if this can exploit the best performance from h/w) Question: Is this still the best to use for FP32? What about BF16 and INT8 kernels? For reuse weights tensors during inference, how do I call dnnl:sgemm? Do I need to use some reorder API followed by dnnl:sgemm with tweaked arguments which says that it should work with blocked B matrix?

  3. BRGEMM for matrix multiplication How do I call this API directly from User code? For reuse weights tensors during inference, what APIs I need to use? brgemm_inner_product_fwd_t() brgemm_matmul_t

  4. BRGCONV for convolution (FP32, BF16, INT8) BRGCONV is only applicable for 1x1 convolution, or it plays role in non 1x1 also? I see two instances. CPU_INSTANCE_AVX512(brgemm_1x1_convolution_fwd_t CPU_INSTANCE_AVX512(brgemm_convolution_fwd_t

Thanks in advance!

vpirogov commented 1 year ago

@avinashcpandey, thanks for the question. The best way to implement matrix-matrix multiplication in context of deep learning models is matmul primitive. It support batching, fused bias, fused activation (and more with post-ops), low precision, and weights pre-packing. Check out these examples:

Matrices A and B in matmul are runtime parameters in matmul. You can reuse memory object representing weights between matmul calls.

Here're details on other options you listed:

avinashcpandey commented 1 year ago

thanks @vpirogov for the prompt response.