This pull request includes several changes to the benchmarking scripts and the matrix multiplication and multi-head attention implementations, as well as updates to the mfma_macro_generator.py file to support different thread binding layouts. The most important changes include updating the submodule commit, adding new benchmarking scripts, and modifying the mfma_macro_generator.py to support different thread binding layouts.
This pull request includes several changes to the benchmarking scripts and the matrix multiplication and multi-head attention implementations, as well as updates to the
mfma_macro_generator.py
file to support different thread binding layouts. The most important changes include updating the submodule commit, adding new benchmarking scripts, and modifying themfma_macro_generator.py
to support different thread binding layouts.Benchmarking updates:
benchmark/tilelang/benchmark.sh
: Added multiple new benchmarking commands for different matrix dimensions.benchmark/tilelang/benchmark_tilelang_matmul.py
: Added a new script for benchmarking matrix multiplication with various configurations.benchmark/tilelang/benchmark_tilelang_mha.py
: Added a new script for benchmarking multi-head attention with various configurations.Matrix multiplication and multi-head attention implementations:
bitblas/tl/mfma_macro_generator.py
: Added support for different thread binding layouts by introducing theis_m_first
flag and modifying methods to use this flag. [1] [2] [3] [4] [5] [6]Code simplification and cleanup:
bitblas/tl/mfma_layout.py
: Removed an unused import and added new functions for different thread binding layouts. [1] [2]bitblas/tl/utils.py
: Updated imports and modified themfma_store_index_map
function to use the new thread binding layout function. [1] [2]Submodule update:
3rdparty/tvm
: Updated the submodule commit to a new version.