This pull request primarily focuses on enhancing the functionality of the bitblas Python package and updating the version number. The main changes include the addition of MatmulConfigWithSplitK and MatmulWithSplitK in the bitblas module, updates to the gemv and gemv_dequantize modules to support more iterations, and modifications to the quantization module for better handling of floating point numbers. The version number has also been updated from 0.0.1.dev9 to 0.0.1.dev12.
Version Update:
VERSION and python/bitblas/__init__.py: Updated the version number from 0.0.1.dev9 to 0.0.1.dev12. [1][2]
Enhancements to bitblas module:
python/bitblas/__init__.py: Imported MatmulConfigWithSplitK and MatmulWithSplitK from general_matmul_splitk module.
python/bitblas/quantization/quantization.py: Revised _tir_u8_to_f8_e4m3_to_f16 function and added a new function _tir_u8_to_f8_e4m3_to_f16_naive for better handling of floating point numbers.
This pull request primarily focuses on enhancing the functionality of the
bitblas
Python package and updating the version number. The main changes include the addition ofMatmulConfigWithSplitK
andMatmulWithSplitK
in thebitblas
module, updates to thegemv
andgemv_dequantize
modules to support more iterations, and modifications to thequantization
module for better handling of floating point numbers. The version number has also been updated from0.0.1.dev9
to0.0.1.dev12
.Version Update:
VERSION
andpython/bitblas/__init__.py
: Updated the version number from0.0.1.dev9
to0.0.1.dev12
. [1] [2]Enhancements to
bitblas
module:python/bitblas/__init__.py
: ImportedMatmulConfigWithSplitK
andMatmulWithSplitK
fromgeneral_matmul_splitk
module.Updates to
gemv
andgemv_dequantize
modules:python/bitblas/gpu/gemv.py
: Extended the acceptable range ofblock_info.iters
length to include 4.python/bitblas/gpu/gemv_dequantize.py
: Adjusted the logic inget_vectorize_factor
to handle cases where the length ofsch.get_loops(block_b)
is 4. [1] [2]Modifications to
quantization
module:python/bitblas/quantization/quantization.py
: Revised_tir_u8_to_f8_e4m3_to_f16
function and added a new function_tir_u8_to_f8_e4m3_to_f16_naive
for better handling of floating point numbers.Other Changes:
python/bitblas/wrapper/general.py
: Modified thelegalize_c
function to handle cases wheredynamic_symbolic_set
is not empty.testing/python/operators/test_general_matmul_splitk_ops.py
: Added additional calls tomatmul.forward
for testing purposes.