This pull request primarily includes updates to the version number, modifications to the CUDA kernel launch string in the legalize_c function, and a change in the test_matmul_torch_forward_fp8e4m3 test function. The version number is updated in two places, the VERSION file and the __init__.py file. The legalize_c function is modified to return early if a certain condition is met. Lastly, the test_matmul_torch_forward_fp8e4m3 function now uses a list instead of a single value for the M parameter.
Version number updates:
VERSION: The version number is updated from 0.0.1.dev8 to 0.0.1.dev9.
python/bitblas/wrapper/general.py: The CUDA kernel launch string in the legalize_c function is modified to return early if the first element of dynamic_symbolic_set is 0. [1][2]
This pull request primarily includes updates to the version number, modifications to the CUDA kernel launch string in the
legalize_c
function, and a change in thetest_matmul_torch_forward_fp8e4m3
test function. The version number is updated in two places, theVERSION
file and the__init__.py
file. Thelegalize_c
function is modified to return early if a certain condition is met. Lastly, thetest_matmul_torch_forward_fp8e4m3
function now uses a list instead of a single value for theM
parameter.Version number updates:
VERSION
: The version number is updated from0.0.1.dev8
to0.0.1.dev9
.python/bitblas/__init__.py
: The__version__
string is updated from0.0.1.dev8
to0.0.1.dev9
.Changes in
legalize_c
function:python/bitblas/wrapper/general.py
: The CUDA kernel launch string in thelegalize_c
function is modified to return early if the first element ofdynamic_symbolic_set
is0
. [1] [2]Test function modification:
testing/python/operators/test_general_matmul_splitk_ops.py
: TheM
parameter in thetest_matmul_torch_forward_fp8e4m3
function is changed from a single valueM
to a list[1, 16]
.