Though currently we provide a strategy for users to define a search space for given TL Kernel. But it's still be hard and complex to define a precise and efficient search space for dynamic shapes and operators for given OP and backend.
InjtectThreadSync Pass must be applied after the pass MergeSharedMemoryAllocation, because MergeSharedMemoryAllocation pass modifies the buffer region, altering the liveness domain of the buffer.
Though currently we provide a strategy for users to define a search space for given TL Kernel. But it's still be hard and complex to define a precise and efficient search space for dynamic shapes and operators for given OP and backend.
This pull request link search space up with our roller search space.
However, the Block Level TL can not fully utilize the schedule information, for example, TL only provide dedicated three Warp Scheduling Policy: