Closed xzwj1699 closed 6 days ago
You have to manually set cuda device before running triton kernel. e.g.,
with torch.cuda.device(data.device):
_minmax_along_last_dim[grid](data, mn, mx, data.numel(), data.shape[0], num_groups, group_size, BLOCK_SIZE_N=BLOCK_SIZE_N, num_warps=8)
I met an error when I tried KIVI, and here is the code. (I modify the example.py in order to run in my server)
and here is the full error log
I guess that there may be something wrong with my modified code, and it would great if you could help.
Thank you!