microsoft / BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
MIT License
190 stars 21 forks source link

[FP8] Support Weight Dequantize FP16xFP8_E4M3 #42

Closed LeiWang1999 closed 1 month ago

LeiWang1999 commented 1 month ago

This pull request primarily focuses on expanding the functionality of the existing codebase to include support for new formats and simplifying the existing code. The most significant changes include the addition of new formats (FP8_E4M3, FP_E5M2) in the check_weight_decode_info function, simplification of code in general_matmul.py, and the addition of new conversion functions in quantization.py.

Addition of new formats:

Code simplification:

Addition of new conversion functions: