This pull request introduces a number of changes across the python/bitblas package in order to improve the functionality of the BitBlas library. The changes include updates to the Rasterization and TensorCoreExtraConfig classes, modifications to the fast_decode_impl method, and the addition of the MatmulWithSplitK class.
Updates to Rasterization and TensorCoreExtraConfig classes:
python/bitblas/gpu/intrin/lop3.py: Reformatted the arguments in the get_fast_decode_intrin calls within the fast_decode_impl method for better readability. [1][2]
Addition of MatmulWithSplitK class:
python/bitblas/ops/general_matmul_splitk.py: Added a new file implementing the MatmulWithSplitK class, which extends the functionality of the Matmul class with the ability to split the K dimension.
python/bitblas/ops/general_matmul.py: Removed the OPExecutorCPU class and added a condition to check if fast decoding is supported in the __initialize_fast_decoding method. [1][2]
This pull request introduces a number of changes across the
python/bitblas
package in order to improve the functionality of the BitBlas library. The changes include updates to theRasterization
andTensorCoreExtraConfig
classes, modifications to thefast_decode_impl
method, and the addition of theMatmulWithSplitK
class.Updates to
Rasterization
andTensorCoreExtraConfig
classes:python/bitblas/base/roller/__init__.py
: Imported newRasterization
classes.python/bitblas/base/roller/hint.py
: Added a new methodtensorcore_legalization
to theTensorCoreExtraConfig
class.Modifications to
fast_decode_impl
method:python/bitblas/gpu/intrin/lop3.py
: Reformatted the arguments in theget_fast_decode_intrin
calls within thefast_decode_impl
method for better readability. [1] [2]Addition of
MatmulWithSplitK
class:python/bitblas/ops/general_matmul_splitk.py
: Added a new file implementing theMatmulWithSplitK
class, which extends the functionality of theMatmul
class with the ability to split the K dimension.Other important changes:
3rdparty/tvm
: Updated the subproject commit.python/bitblas/base/roller/policy/tensorcore.py
: Added a call totensorcore_legalization
in the_score
method.python/bitblas/module/__init__.py
: Changed the default value offast_decoding
fromTrue
toNone
in the__init__
method.python/bitblas/ops/general_matmul.py
: Removed theOPExecutorCPU
class and added a condition to check if fast decoding is supported in the__initialize_fast_decoding
method. [1] [2]