This pull request primarily focuses on refining the type conversions and adjusting the precision in the testing function. The changes are aimed at improving the efficiency and accuracy of the code.
Here are the key changes:
Type conversion refinement:
python/bitblas/quantization/quantization.py: In the function _tir_u8_to_f8_e4m3_to_f16, the type of the shift operation has been changed from int16 to uint16. Also, the calculation of s_f16 and e_f16 has been modified to use bitwise operations.
Precision adjustment:
testing/python/operators/test_general_matmul_fp8.py: In the function map_torch_type, the relative and absolute tolerances for the torch.testing.assert_close function have been increased from 1e-2 to 1e-1 to adjust the precision of the test.
This pull request primarily focuses on refining the type conversions and adjusting the precision in the testing function. The changes are aimed at improving the efficiency and accuracy of the code.
Here are the key changes:
Type conversion refinement:
python/bitblas/quantization/quantization.py
: In the function_tir_u8_to_f8_e4m3_to_f16
, the type of the shift operation has been changed fromint16
touint16
. Also, the calculation ofs_f16
ande_f16
has been modified to use bitwise operations.Precision adjustment:
testing/python/operators/test_general_matmul_fp8.py
: In the functionmap_torch_type
, the relative and absolute tolerances for thetorch.testing.assert_close
function have been increased from1e-2
to1e-1
to adjust the precision of the test.