Closed peterbell10 closed 1 day ago
Just to confirm, the TritonGPU IR is generated from valid Triton python code?
It's came from the lowering from a new operator I'm adding, but I'll see if I can reproduce with an existing operator.
This produces the same error on the current master branch
import triton.language as tl
import triton
import torch
@triton.jit
def test_fn(out_ptr, a_ptr, workspace, M, N, M_BLOCK: tl.constexpr, N_BLOCK: tl.constexpr):
desc_ptr = workspace
tl.extra.cuda.experimental_device_tensormap_create2d(desc_ptr=desc_ptr, global_address=a_ptr, load_size=[4, N_BLOCK], global_size=[M, N], element_ty=a_ptr.dtype.element_ty)
tl.extra.cuda.experimental_tensormap_fenceproxy_acquire(desc_ptr)
gather = tl._experimental_descriptor_load(desc_ptr, [0, 0], [4, N_BLOCK], a_ptr.dtype.element_ty)
tl.store(out_ptr + tl.arange(0, 4)[:, None] * N_BLOCK + tl.arange(0, N_BLOCK)[None, :], gather)
out = torch.empty((4, 128), dtype=torch.float32, device="cuda")
inp = torch.arange(4 * 128, dtype=torch.float32, device="cuda").reshape(4, 128)
workspace = torch.empty(128, dtype=torch.uint8, device="cuda")
test_fn[(1,)](out, inp, workspace, 4, 128, 4, 128)
I'll take a look today
I am running into an assertion error in the codegen for
local_load
which is coming from the linear layouts code. Here is a minified reproducerWhen lowering to llvm ir it fails with the following error
cc @Jokeren @jlebar