Open pbchekin opened 3 weeks ago
IGC has a pass that scalarizes the vector addition. Before that pass the LLVM IR is:
%42 = fadd <4 x float> %bc3, %bc9, !dbg !373
%43 = fadd <4 x float> %bc3, %bc9, !dbg !373
%44 = fadd <4 x float> %bc3, %bc9, !dbg !373
%45 = shufflevector <4 x float> %43, <4 x float> %44, <4 x i32> <i32 0, i32 5, i32 undef, i32 undef>, !dbg !375
%46 = fadd <4 x float> %bc3, %bc9, !dbg !373
%47 = shufflevector <4 x float> %45, <4 x float> %46, <4 x i32> <i32 0, i32 1, i32 6, i32 undef>, !dbg !375
%48 = shufflevector <4 x float> %47, <4 x float> %42, <4 x i32> <i32 0, i32 1, i32 2, i32 7>, !dbg !375
%49 = sext i32 %9 to i64, !dbg !374
%50 = getelementptr float, float addrspace(1)* %2, i64 %49, !dbg !374
%51 = bitcast float addrspace(1)* %50 to <4 x float> addrspace(1)*, !dbg !375
store <4 x float> %48, <4 x float> addrspace(1)* %51, align 16, !dbg !375
and after that pass the vector add is scalarized:
59: ; preds = %52, %51
%bc1226 = phi float [ %55, %52 ], [ 0.000000e+00, %51 ], !dbg !371
%bc1227 = phi float [ %56, %52 ], [ 0.000000e+00, %51 ], !dbg !371
%bc1228 = phi float [ %57, %52 ], [ 0.000000e+00, %51 ], !dbg !371 the
%bc1229 = phi float [ %58, %52 ], [ 0.000000e+00, %51 ], !dbg !371
%60 = fadd float %bc618, %bc1226, !dbg !372
%61 = fadd float %bc619, %bc1227, !dbg !372
%62 = fadd float %bc620, %bc1228, !dbg !372
%63 = fadd float %bc621, %bc1229, !dbg !372
%64 = fadd float %bc618, %bc1226, !dbg !372
%65 = fadd float %bc619, %bc1227, !dbg !372
%66 = fadd float %bc620, %bc1228, !dbg !372
%67 = fadd float %bc621, %bc1229, !dbg !372
%68 = fadd float %bc618, %bc1226, !dbg !372
%69 = fadd float %bc619, %bc1227, !dbg !372
%70 = fadd float %bc620, %bc1228, !dbg !372
%71 = fadd float %bc621, %bc1229, !dbg !372
%72 = fadd float %bc618, %bc1226, !dbg !372
%73 = fadd float %bc619, %bc1227, !dbg !372
%74 = fadd float %bc620, %bc1228, !dbg !372
%75 = fadd float %bc621, %bc1229, !dbg !372
%76 = getelementptr float, float addrspace(1)* %2, i64 %21, !dbg !373rformed by IG
br i1 %19, label %77, label %101, !dbg !374
So this is a transformation performed by IGC. Triton generates the vector code. Is unclear at this point the reason the SYCL program is not scalarized. @pbchekin who is the contact and can we get the SYCL code reproducer along with compilation command?
@etiotto Can you give open-linux-driver-ci-dev_igc-17737 a try? It contains a recent change which makes that IGC pass more restrictive.
Are we confusing vector types and vectorization? SYCL has a vec4 type which is syntactic sugar for unpacking a struct. https://developer.codeplay.com/products/computecpp/ce/2.11.0/api-reference/vec__types__defines_8h.html
Are we confusing vector types and vectorization? SYCL has a vec4 type which is syntactic sugar for unpacking a struct. https://developer.codeplay.com/products/computecpp/ce/2.11.0/api-reference/vec__types__defines_8h.html
I don't have the SYCL program, however from the original question I am guessing the LLVM IR generated by SYCL would contain vector adds and that for some reasons IGC doesn't scalarize them. When we get the SYCL program we can check the LLVM IR it generates.
@pbchekin do you have the contact info for the person that asked the original question?
Received this:
OCL_asm67166d3621db5283_beforeUnification.zip OCL_asm67166d3621db5283_optimized.zip