microsoft / BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
MIT License
359 stars 29 forks source link

[Dev] Fix a but within FP8 E4M3 Fast Decoding #54

Closed LeiWang1999 closed 3 months ago

LeiWang1999 commented 3 months ago

This pull request primarily focuses on enhancing the functionality of the bitblas Python package and updating the version number. The main changes include the addition of MatmulConfigWithSplitK and MatmulWithSplitK in the bitblas module, updates to the gemv and gemv_dequantize modules to support more iterations, and modifications to the quantization module for better handling of floating point numbers. The version number has also been updated from 0.0.1.dev9 to 0.0.1.dev12.

Version Update:

Enhancements to bitblas module:

Updates to gemv and gemv_dequantize modules:

Modifications to quantization module:

Other Changes: