Closed rokada-br closed 1 month ago
Hi @rokada-br , Thanks for your attention!
Looks like we should make sure the tensor is contiguous if we use 3rd party kernels.
c = matmul(a.to(torch.int8),
matmul.transform_weight(w.t().contiguous().to(torch.int8)))
print(c)
And this works.
Hi, @rokada-br , I've made a pull request to fix it!
Greetings!
I've been trying to multiply small matrices $A \times W$ to learn how to use BitBLAS properly. From my understanding,
layout="nt"
tells me that W should be transposed.So far, initializing the weights with the values already transposed ($W^t$) gives the correct result. However, if I initialize $W$ then transpose the tensor with
W.t()
ortorch.transpose(W, 0, 1)
, the output is no longer correct.Is my understanding of how I should be using the library correct?
System Specs
Code Sample
Matrix multiplication with
int8
values andint32
accumulation.$$ \begin{bmatrix} 2 & 3 \end{bmatrix} \begin{bmatrix} 4 & 2 & 3 \ 2 & 1 & 2 \end{bmatrix} = \begin{bmatrix} 14 & 7 & 12 \end{bmatrix} $$
PyTorch's
matmul
gives the expected answer:Likewise, using BitBLAS with
int8
gives the correct answer:However, if I initialize
w
with the values in their "natural" order then transpose afterwards, the output is no longer the same:w.t()
andwt
should be the same, unless there are some memory shenanigans I'm not aware of:Is it a known issue? Or am I missing something? Thanks in advance!