microsoft / BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
MIT License
359 stars 29 forks source link

[BUGFix] Fix UINT/INT8 dequantize implementation and optimize the schedule template for float32 accum #46

Closed LeiWang1999 closed 3 months ago

LeiWang1999 commented 3 months ago

This pull request includes changes to several Python files in the bitblas library, with the primary goal of improving support for different data types and making the code more robust. This includes changes to the hint.py, tensorcore.py, lop3.py, general_matmul.py, and matmul_dequantize_impl.py files. The changes can be grouped into three main categories: updates to the hint.py and tensorcore.py files to handle different data types, improvements to the lop3.py file to better handle different bit sizes, and changes to the general_matmul.py and matmul_dequantize_impl.py files to add assertions and handle different bit sizes.

Handling different data types:

Improvements to handle different bit sizes:

Adding assertions and handling different bit sizes:

Other changes: