microsoft / BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
MIT License
423 stars 34 forks source link

[Issue 192] Tail split support for dynamic matmul #227

Closed tzj-fxz closed 1 month ago

tzj-fxz commented 1 month ago
LeiWang1999 commented 1 month ago

LGTM! Thanks @tzj-fxz , but surprised that we modified tl pass only instead of TIR Transform LoopVectorize. As such approach should also work for a tir script, not only for sake of tl?

LeiWang1999 commented 1 month ago

LGTM, Merged.

tzj-fxz commented 1 month ago

LGTM! Thanks @tzj-fxz , but surprised that we modified tl pass only instead of TIR Transform LoopVectorize. As such approach should also work for a tir script, not only for sake of tl?

In TIR Transform LoopVectorize, vectorizing with dynamic buffer is not allowed and will be replaced by serial read, even if a TailSplit pass is inserted before it.

So I switch to modify tl pass on vectorizing and related dynamic call (like the condition in T.if_then_else which cannot be vectorizedly checked). This can make TIR unaware of the dynamic vectorization and save it from being changed to serialization.