Open ajwaitz opened 1 month ago
Thanks @ajwaitz for reporting the bug! We will try to get to this. @parsifal-47 would this be something you would be interested in taking a quick look at? Thanks!
@nhat-nguyen yes, would be happy to take a look, thank you!
@ajwaitz sorry for the delay, I tried modifying test_matmul.py with your block sizes and it passed, after that, I tried to copy-paste "Triton python code" from beginning of this issue and it also works, I see "✅ Triton and Torch match" message and I do not see zeroes in the print. What am I missing?
Triton python code
Triton IR
Crash log
No crash. Output incorrect.
Additional information
When running the matmul kernel with the CPU driver with the given block parameters, the output is incorrect (compared to the Pytorch output). A significant part of the output matrix is all zeros while the rest contains the appropriate matmul result.
This kernel is from the
python/examples
directory. Given the default block parameters, the output is correct. Upon adjusting the parameters, however, it is easy to pick parameters that produce the incorrect behavior.The given IR is
triton-shared
IR.