Closed rgiduthuri-intel closed 2 months ago
Hello @rgiduthuri-intel,
I have made minor modification in your kernel code and with that sample code runs successfully. While I am trying to root cause the issue I think this workaround will unblock you. ========= Original code =========== @triton.jit def kernel( out, e_dst, e_src, adj1, adj2, num_rows, WG_SIZE : tl.constexpr ): zero_indices = tl.zeros((WG_SIZE,), dtype=adj2.dtype.element_ty) zero_values = tl.zeros((WG_SIZE,), dtype=e_src.dtype.element_ty)
============= Modified code ========== @triton.jit def kernel( out_, edst, esrc, adj1, adj2, numrows, WG_SIZE : tl.constexpr ): zero_indices = tl.zeros((WGSIZE,), dtype=adj2.dtype.element_ty) zero_values = tl.zeros((WG_SIZE,), dtype=esrc.dtype.element_ty)
To me looks like it is name conflict issue as change in kernel parameter names (notice the additional '_' to each parameters) resolved the error.
@rgiduthuri-intel, I did further investigation and found even with latest Triton (branch:main) test fails on NVidia machine. You might have installed using pip which installs Triton 2.2.1 hence you got the success. Since intel-xpu-backend-for-triton has got latest triton source hence you see the failure here.
Conclusion: It is not a Intel Triton issue but it is Triton 3.0 issue. @chengjunlu has already raised the fix . Until fix gets merged and unstreamed please use the workaround I have suggested.
Thanks
Great! Thanks for the workaround. Please feel free to close this issue.
@Sarbojit2019 Posted a PR in OpenAI repo to fix this issue: https://github.com/openai/triton/pull/3383
The use of
tensor.dtype.element_ty
in the below Triton kernel is producing an error (pasted at the end of this message). I'm using the latest llvm-target build as of few hours ago. [The PT 2.1 with CUDA didn't report the error]. Appreciate any quick workaround suggestions. ThanksHere's the error message: