[FP8] Support Weight Dequantize FP16xFP8_E4M3

This pull request primarily focuses on expanding the functionality of the existing codebase to include support for new formats and simplifying the existing code. The most significant changes include the addition of new formats (FP8_E4M3, FP_E5M2) in the check_weight_decode_info function, simplification of code in general_matmul.py, and the addition of new conversion functions in quantization.py.

Addition of new formats:

python/bitblas/gpu/gemv_dequantize.py: Updated the check_weight_decode_info function to include the new formats "fp_e5m2" and "fp_e4m3" in the list of acceptable formats. [1] [2]
python/bitblas/gpu/matmul_mma_dequantize.py: Similar changes were made in this file to include the new format "fp_e4m3". [1] [2] [3]

Code simplification:

python/bitblas/ops/general_matmul.py: Multiple changes were made in this file to simplify the code. These changes primarily involve reducing the number of lines of code by combining statements and removing unnecessary parentheses. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]

Addition of new conversion functions:

python/bitblas/quantization/quantization.py: New conversion functions _tir_u8_to_f8_e4m3_to_f16 and _tir_u8_to_f8_e5m2_to_f16 were added to support the new formats. [1] [2]

microsoft / BitBLAS

[FP8] Support Weight Dequantize FP16xFP8_E4M3 #42