Open shadowpa0327 opened 1 month ago
This bug is still in the master branch as of October 17th, it happens with integer-packed weights. One way to fix it is by replacing the for loop with:
for k in tl.range(0, total_blocks_k, 1, num_stages=1):
The strange thing also is that, if you add an if statement in the for loop, the error disappears:
for k in range(0, total_blocks_k, 1):
if(k < total_blocks_k):
Problem Statement
I am trying to dequantize the quantized tensor (packed into
int32
) and perform multiplication to another tensor infp16
. However, I observed a weird error:LLVM ERROR: mma16816 data type not supported
when invokingtl.dot
. When I further multiply a 1.0 to the dequantized tensorx * scales + zeros * 1.0
and downcast back totl.float16
, then the program can be executed properly.It seems like this phenomenon only happens in
triton==3.0.0
. I have tried to downgrade the triton to2.3.0
, and it works well. Does anyone know some of the possible reasons behind this phenomenon or any potential bug in my implementation?Dependency
Error message
Code to reproduce
For checking the kernel implementation, please go ahead to the
_ab_qx_fwd
function.