Open victor-eds opened 2 days ago
The changes LGTM. But I am not sure about the reason of the old code. Let @whitneywhtsang to approve.
There are some test_reduce
failures, maybe due to changing from max_work_group_size
to reqd_work_group_size
?
There are some
test_reduce
failures, maybe due to changing frommax_work_group_size
toreqd_work_group_size
?
Interesting. I'll look into that tomorrow.
There are some
test_reduce
failures, maybe due to changing frommax_work_group_size
toreqd_work_group_size
?Interesting. I'll look into that tomorrow.
Confirmed it is due to changing from max_work_group_size
to reqd_work_group_size
. I modified main
branch to use reqd_work_group_size
, and I can observe the test_reduce
failures.
Confirmed it is due to changing from
max_work_group_size
toreqd_work_group_size
. I modifiedmain
branch to usereqd_work_group_size
, and I can observe thetest_reduce
failures.
Interesting. I'll take a look.
Interesting. I'll take a look.
@whitneywhtsang @etiotto Apparently not setting anything at all fixes crashes. I'd go with this for now, get this merged to get going and open an investigation ticket to tackle ASAP. The reqd_work_group_size
attribute may be helpful for codegen and improve performance after all and I'd say we want to use that.
My guess is we're modifying the number of warps or warp size at some point during this lowering process and this mismatch leads to crashes.
Does this course of action sound good?
I'll restore back max_work_group_size
and file a ticket to use reqd_work_group_size
when we find out what's the matter here
I will keep the dense array specification for max_work_group_size
as this has no impact, I had already done it and it will make the change to llvm.func
's reqd_work_group_size
easier.
Use
llvm.func
intel_reqd_sub_group_size
to express sub-group size instead oftriton_gen
attributes that are later translated.Replace
triton_gen.max_work_group_size
value type with dense array.