Open cccc0der opened 1 year ago
I also reinstall triton with version 2.0.0
, still error
03-matmul
Traceback (most recent call last):
File "<string>", line 21, in matmul_kernel
KeyError: ('2-.-0-.-0-7d1eb0d2fed8ff2032dccb99c2cc311a-d6252949da17ceb5f3a278a70250af13-1af5134066c618146d2cd009138944a0-14de7de5c4da5794c8ca14e7e41a122d-3498c340fd4b6ee7805fd54b882a04f5-e1f133f98d04093da2078dfc51c36b72-b26258bf01f839199e39d64851821f26-b9ae7213d41541f67843018d049e1f90-d7c06e3b46e708006c15224aac7a1378-f585402118c8a136948ce0a49cfe122c', (torch.float16, torch.float16, torch.float16, 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32'), (128, 256, 64, 8, None), (True, True, True, (True, False), (True, False), (True, False), (True, False), (False, True), (True, False), (False, True), (True, False), (False, True)))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "03-matrix-multiplication.py", line 290, in <module>
triton_output = matmul(a, b, activation=None)
File "03-matrix-multiplication.py", line 270, in matmul
matmul_kernel[grid](
File "/home/tysearch/.local/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 90, in run
return self.fn.run(*args, num_warps=config.num_warps, num_stages=config.num_stages, **kwargs, **config.kwargs)
File "<string>", line 43, in matmul_kernel
File "/home/tysearch/.local/lib/python3.8/site-packages/triton/compiler.py", line 1678, in __getattribute__
self._init_handles()
File "/home/tysearch/.local/lib/python3.8/site-packages/triton/compiler.py", line 1670, in _init_handles
raise OutOfResources(self.shared, max_shared, "shared memory")
triton.compiler.OutOfResources: out of resource: shared memory, Required: 147456, Hardware limit: 65536. Reducing block sizes or `num_stages` may help.
06-fused-attention
error: 'tt.reduce' op inferred type(s) 'tensor<128xf32, #triton_gpu.slice<{dim = 1, parent = #triton_gpu.mma<{versionMajor = 1, versionMinor = 0, warpsPerCTA = [4, 1]}>}>>' are incompatible with return type(s) of operation 'tensor<128xf32, #triton_gpu.slice<{dim = 1, parent = #triton_gpu.mma<{versionMajor = 1, versionMinor = 2, warpsPerCTA = [2, 2]}>}>>'
Traceback (most recent call last):
File "<string>", line 21, in _fwd_kernel
KeyError: ('2-.-0-.-0-7d1eb0d2fed8ff2032dccb99c2cc311a-d6252949da17ceb5f3a278a70250af13-1af5134066c618146d2cd009138944a0-14de7de5c4da5794c8ca14e7e41a122d-3498c340fd4b6ee7805fd54b882a04f5-e1f133f98d04093da2078dfc51c36b72-b26258bf01f839199e39d64851821f26-b9ae7213d41541f67843018d049e1f90-d7c06e3b46e708006c15224aac7a1378-f585402118c8a136948ce0a49cfe122c', (torch.float16, torch.float16, torch.float16, 'fp32', torch.float32, torch.float32, torch.float16, 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32'), (128, 64, 128), (True, True, True, (False,), True, True, True, (True, False), (True, False), (True, False), (False, True), (True, False), (True, False), (True, False), (False, True), (True, False), (True, False), (True, False), (False, True), (True, False), (True, False), (True, False), (False, True), (False, False), (True, False), (True, False)))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "06-fused-attention.py", line 358, in <module>
bench_flash_attention.run(save_path='.', print_data=True)
File "/home/tysearch/.local/lib/python3.8/site-packages/triton/testing.py", line 317, in run
self._run(bench, save_path, show_plots, print_data)
File "/home/tysearch/.local/lib/python3.8/site-packages/triton/testing.py", line 272, in _run
ret = self.fn(**x_args, **{bench.line_arg: y}, **bench.args)
File "06-fused-attention.py", line 341, in bench_flash_attention
ms = triton.testing.do_bench(fn, percentiles=None, warmup=warmup, rep=rep)
File "/home/tysearch/.local/lib/python3.8/site-packages/triton/testing.py", line 143, in do_bench
fn()
File "06-fused-attention.py", line 336, in <lambda>
fn = lambda: attention(q, k, v, sm_scale)
File "06-fused-attention.py", line 213, in forward
_fwd_kernel[grid](
File "<string>", line 41, in _fwd_kernel
File "/home/tysearch/.local/lib/python3.8/site-packages/triton/compiler.py", line 1620, in compile
next_module = compile(module)
File "/home/tysearch/.local/lib/python3.8/site-packages/triton/compiler.py", line 1551, in <lambda>
lambda src: ttir_to_ttgir(src, num_warps, num_stages, capability)),
File "/home/tysearch/.local/lib/python3.8/site-packages/triton/compiler.py", line 992, in ttir_to_ttgir
pm.run(mod)
RuntimeError: PassManager::run failed
I suppose the cause is T4-Turing not support num_stages?
num_stages – the number of stages that the compiler should use when software-pipelining loops. Mostly useful for matrix multiplication workloads on SM80+ GPUs.
It seems I can only receive one Meta-parameters
, if I set more than 1 param with tl.constexpr
type, KeyError
will happen
@triton.jit
def matmul_kernel(
...
# Meta-parameters
BLOCK_SIZE_M: tl.constexpr, BLOCK_SIZE_N: tl.constexpr, BLOCK_SIZE_K: tl.constexpr,
GROUP_SIZE_M: tl.constexpr,
ACTIVATION: tl.constexpr,
):
Hi, I'm new to triton and doing some pretrain work.
I tested the tutorials in triton, 01-vector-add.py, 02-fused-softmax, 04-low-memory-dropout, 05-layer-norm works fine but error occurs when I tried 03-matrix-multiplication and 06-fused-attention
env list
I have also tested with torch 1.12\cuda1.16, it still not works, is this a GPU incompatible problem?
03-matmul error, I also met the
Triton Error
with flash_attn_triton06-fused-attention error: