This pull request primarily focuses on expanding the functionality of the existing codebase to include support for new formats and simplifying the existing code. The most significant changes include the addition of new formats (FP8_E4M3, FP_E5M2) in the check_weight_decode_info function, simplification of code in general_matmul.py, and the addition of new conversion functions in quantization.py.
Addition of new formats:
python/bitblas/gpu/gemv_dequantize.py: Updated the check_weight_decode_info function to include the new formats "fp_e5m2" and "fp_e4m3" in the list of acceptable formats. [1][2]
This pull request primarily focuses on expanding the functionality of the existing codebase to include support for new formats and simplifying the existing code. The most significant changes include the addition of new formats (FP8_E4M3, FP_E5M2) in the
check_weight_decode_info
function, simplification of code ingeneral_matmul.py
, and the addition of new conversion functions inquantization.py
.Addition of new formats:
python/bitblas/gpu/gemv_dequantize.py
: Updated thecheck_weight_decode_info
function to include the new formats "fp_e5m2" and "fp_e4m3" in the list of acceptable formats. [1] [2]python/bitblas/gpu/matmul_mma_dequantize.py
: Similar changes were made in this file to include the new format "fp_e4m3". [1] [2] [3]Code simplification:
python/bitblas/ops/general_matmul.py
: Multiple changes were made in this file to simplify the code. These changes primarily involve reducing the number of lines of code by combining statements and removing unnecessary parentheses. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]Addition of new conversion functions:
python/bitblas/quantization/quantization.py
: New conversion functions_tir_u8_to_f8_e4m3_to_f16
and_tir_u8_to_f8_e5m2_to_f16
were added to support the new formats. [1] [2]