microsoft / BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
MIT License
428 stars 34 forks source link

[TL] Adapt TL Hardware-aware Search Space with Roller #207

Closed LeiWang1999 closed 1 month ago

LeiWang1999 commented 1 month ago

Though currently we provide a strategy for users to define a search space for given TL Kernel. But it's still be hard and complex to define a precise and efficient search space for dynamic shapes and operators for given OP and backend.

 def get_configs_sm80(self):
        num_stages = 2
        configs = [
            {
                'block_M': 128,
                'block_N': 256,
                'block_K': 32,
                'threads': 128
            },
            {
                'block_M': 256,
                'block_N': 128,
                'block_K': 32,
                'threads': 128
            },
            {
                'block_M': 128,
                'block_N': 128,
                'block_K': 32,
                'threads': 128
            },
        ]
        configs = [{**c, 'num_stages': num_stages} for c in configs]
        return configs

This pull request link search space up with our roller search space.

However, the Block Level TL can not fully utilize the schedule information, for example, TL only provide dedicated three Warp Scheduling Policy:

class GemmWarpPolicy:
    Square = 0
    FullRow = 1
    FullCol = 2
LeiWang1999 commented 1 month ago

BUG Fix:

InjtectThreadSync Pass must be applied after the pass MergeSharedMemoryAllocation, because MergeSharedMemoryAllocation pass modifies the buffer region, altering the liveness domain of the buffer.