Open PiotrFusik opened 4 years ago
Marking subgroups operations that are to be called in non-uniform control flow with convergent attribute doesn't help because convergent attribute was added for functions called within uniform CF (by all work items): https://clang.llvm.org/docs/AttributeReference.html#convergent
Extended Description
The following OpenCL code:
kernel void test(global int *data) { uint id = (uint) get_global_id(0); if (id < 4) data[id] = sub_group_elect(); else data[id] = sub_group_elect(); }
when compiled with the current master (f2c97656644e783622a6e60fe452b41ffe0f1d18) as follows: clang -cl-std=CL2.0 -include opencl-c.h -S -emit-llvm sub_group_elect_opt.cl
has an invalid optimization of combining the branches:
; Function Attrs: convergent norecurse nounwind uwtable define dso_local spir_kernel void @test(i32 nocapture %data) local_unnamed_addr #0 !kernel_arg_addr_space !4 !kernel_arg_access_qual !5 !kernel_arg_type !6 !kernel_arg_base_type !6 !kernel_arg_type_qual !7 { entry: %call = tail call i64 @"?get_global_id@@$$J0YAKI@Z"(i32 0) #3 %call2 = tail call i32 @"?sub_group_elect@@$$J0YAHXZ"() #4 %idxprom = and i64 %call, 4294967295 %arrayidx = getelementptr inbounds i32, i32 %data, i64 %idxprom store i32 %call2, i32* %arrayidx, align 4, !tbaa !8 ret void }
sub_group_elect is one of the many functions added in https://reviews.llvm.org/D79781 that perform implicit communication within a subgroup, i.e. threads implemented as SIMD on GPU. All the added functions are affected.
https://reviews.llvm.org/D68994 is a proposal of addressing this problem with the
convergent
attribute.