[XPU][TritonGPUToLLVM] Use `llvm.func` attributes to express kernels ND-ranges

intel / intel-xpu-backend-for-triton

OpenAI Triton backend for Intel® GPUs

MIT License

143 stars 44 forks source link

[XPU][TritonGPUToLLVM] Use `llvm.func` attributes to express kernels ND-ranges #2770

Open victor-eds opened 2 days ago

victor-eds commented 2 days ago

Use llvm.func intel_reqd_sub_group_size to express sub-group size instead of triton_gen attributes that are later translated.

Replace triton_gen.max_work_group_size value type with dense array.

chengjunlu commented 2 days ago

The changes LGTM. But I am not sure about the reason of the old code. Let @whitneywhtsang to approve.

whitneywhtsang commented 2 days ago

There are some test_reduce failures, maybe due to changing from max_work_group_size to reqd_work_group_size?

victor-eds commented 2 days ago

There are some test_reduce failures, maybe due to changing from max_work_group_size to reqd_work_group_size?

Interesting. I'll look into that tomorrow.

whitneywhtsang commented 1 day ago

There are some test_reduce failures, maybe due to changing from max_work_group_size to reqd_work_group_size?

Interesting. I'll look into that tomorrow.

Confirmed it is due to changing from max_work_group_size to reqd_work_group_size. I modified main branch to use reqd_work_group_size, and I can observe the test_reduce failures.

victor-eds commented 1 day ago

Confirmed it is due to changing from max_work_group_size to reqd_work_group_size. I modified main branch to use reqd_work_group_size, and I can observe the test_reduce failures.

Interesting. I'll take a look.

victor-eds commented 1 day ago

Interesting. I'll take a look.

@whitneywhtsang @etiotto Apparently not setting anything at all fixes crashes. I'd go with this for now, get this merged to get going and open an investigation ticket to tackle ASAP. The reqd_work_group_size attribute may be helpful for codegen and improve performance after all and I'd say we want to use that.

My guess is we're modifying the number of warps or warp size at some point during this lowering process and this mismatch leads to crashes.

Does this course of action sound good?

victor-eds commented 11 hours ago

I'll restore back max_work_group_size and file a ticket to use reqd_work_group_size when we find out what's the matter here

victor-eds commented 10 hours ago

I will keep the dense array specification for max_work_group_size as this has no impact, I had already done it and it will make the change to llvm.func's reqd_work_group_size easier.