iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.59k stars 580 forks source link

Port `s8s4s32` mmt4d ukernel code path to x86-64 #16660

Open bjacob opened 7 months ago

bjacob commented 7 months ago

Recently @mariecwhite has been adding s8s4s32 code paths to the mmt4d ukernel, including optimized code paths for arm64 but not for x86-64. This Issue is about adding the x86-64 pieces.

Explanation of "mmt4d": "matrix-times-matrix-transposed on 4D tensors" == our matrix-multiplication ukernel.

Explanation of "s8s4s32": this is the type triple describing the mmt4d op. Here s8 is the LHS element type = signed int8, s4 is the RHS element type = signed int4, s32 is the accumulator (output) element type.

Get familiar with the code:

Explanation of the tile sizes:

How to run tests and micro benchmarks:

 ninja mmt4d_test mmt4d_benchmark && ./runtime/src/iree/builtins/ukernel/tools/mmt4d_test && ./runtime/src/iree/builtins/ukernel/tools/mmt4d_benchmark
pashu123 commented 7 months ago

Added here: https://github.com/openxla/iree/pull/16724