Open ysh329 opened 6 years ago
gemm_mm_interleaved_transposed_f32_midgard
gemm performance using gemm_mm_interleaved_transposed_f32_midgard
strategy
https://github.com/ysh329/OpenCL-101/issues/23
This issue follows strategy above named gemm_mm_interleaved_transposed_f32_midgard
. Besides, ACL has other GEMM or GEVM implementations:
gemm_mm_interleaved_transposed_f32_bifrost
: This OpenCL kernel is optimized for Bifrost. It computes the matrix multiplication between matrix A (src0) and matrix B (src1)gemm_mm_interleaved_transposed_f16
: This OpenCL kernel computes the matrix multiplication between matrix A (src0) and matrix B (src1)gemm_mm_interleaved_transposed_qs8
: This OpenCL kernel computes the matrix multiplication between matrix A (src0) and matrix B (src1) in 8 bit fixed point precisiongemm_mm_interleaved_transposed_qs16
: This OpenCL kernel computes the matrix multiplication between matrix A (src0) and matrix B (src1) in 16 bit fixed point precision. Matrix A and matrix B must be reshaped respectively with @ref gemm_interleave4x4_16bit and @ref gemm_transpose1x8 before running the matrix multiplicationgemm_mm_floating_point
: This OpenCL kernel computes the matrix by matrix multiplication between the matrix A (src0) and matrix B (src1) in case both matrices have not beed reshapedgemm_mm_floating_point_f32_bifrost
: This OpenCL kernel computes the matrix by matrix multiplication between the matrix A (src0) and matrix B (src1) in case both matrices have not beed reshapedgemm_mm_floating_point_f32_bifrost_1000
: This OpenCL kernel computes the matrix by matrix multiplication between the matrix A (src0) and matrix B (src1) in case both matrices have not been reshapedgemm_mm_qs8
: This OpenCL kernel computes the matrix by matrix multiplication between the matrix A (src0) and matrix B (src1) in case both matrices have not beed reshapedgemm_mm_qs16
: This OpenCL kernel computes the matrix by matrix multiplication between the matrix A (src0) and matrix B (src1) in case both matrices have not beed reshapedgemm_lc_vm_f32
: This OpenCL kernel computes the vector by matrix multiplication between each row of A (src0) and matrix B (src1) used for locally connected layer
Benchmark
The strategy ACL using and we dont have:
ACL GEMM