microsoft / BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
MIT License
423 stars 34 forks source link

[Dev][AMD] Support LDS and Flash Attention for AMD Backend #247

Closed LeiWang1999 closed 1 week ago

LeiWang1999 commented 1 week ago

This pull request includes several changes to the benchmarking scripts and the matrix multiplication and multi-head attention implementations, as well as updates to the mfma_macro_generator.py file to support different thread binding layouts. The most important changes include updating the submodule commit, adding new benchmarking scripts, and modifying the mfma_macro_generator.py to support different thread binding layouts.

Benchmarking updates:

Matrix multiplication and multi-head attention implementations:

Code simplification and cleanup:

Submodule update: