Open chengjunlu opened 1 year ago
Disassemble of the SPIRV IR from the DPCPP example:
; SPIR-V
; Version: 1.4
; Generator: Khronos LLVM/SPIR-V Translator; 14
; Bound: 229
; Schema: 0
OpCapability Addresses ; 0x00000014
OpCapability Linkage ; 0x0000001c
OpCapability Kernel ; 0x00000024
OpCapability Vector16 ; 0x0000002c
OpCapability Int64 ; 0x00000034
OpCapability GenericPointer ; 0x0000003c
OpCapability Int8 ; 0x00000044
OpCapability SubgroupDispatch ; 0x0000004c
OpCapability IndirectReferencesINTEL ; 0x00000054
OpCapability VectorComputeINTEL ; 0x0000005c
OpCapability ExpectAssumeKHR ; 0x00000064
OpCapability MemoryAccessAliasingINTEL ; 0x0000006c
OpCapability OptNoneINTEL ; 0x00000074
OpExtension "SPV_INTEL_function_pointers" ; 0x0000007c
OpExtension "SPV_INTEL_memory_access_aliasing" ; 0x0000009c
OpExtension "SPV_INTEL_optnone" ; 0x000000c4
OpExtension "SPV_INTEL_vector_compute" ; 0x000000dc
OpExtension "SPV_KHR_expect_assume" ; 0x000000fc
%1 = OpExtInstImport "OpenCL.std" ; 0x00000118
OpMemoryModel Physical64 OpenCL ; 0x0000012c
OpEntryPoint Kernel %43 "_ZTSZZ4testILb1EEbvENKUlRN4sycl3_V17handlerEE_clES3_EUlNS1_7nd_itemILi1EEEE_" ; 0x00000138
OpExecutionMode %43 ContractionOff ; 0x00000194
OpExecutionMode %43 SubgroupSize 16 ; 0x000001a0
OpSource Unknown 100000 ; 0x000001b0
OpName %__spirv_BuiltInSubgroupId "__spirv_BuiltInSubgroupId" ; 0x000001bc
OpName %__spirv_BuiltInSubgroupLocalInvocationId "__spirv_BuiltInSubgroupLocalInvocationId" ; 0x000001e0
OpName %__spirv_BuiltInWorkgroupId "__spirv_BuiltInWorkgroupId" ; 0x00000214
OpName %__spirv_BuiltInGlobalLinearId "__spirv_BuiltInGlobalLinearId" ; 0x00000238
OpName %__spirv_BuiltInWorkgroupSize "__spirv_BuiltInWorkgroupSize" ; 0x00000260
OpName %_Z33__regcall3____builtin_invoke_simdILb1EfPFNSt12experimental4simdIfNS0_10__simd_abiILNS0_12_StorageKindE2ELi16EEEEERFS5_PfS5_iES6_S5_jEJPS7_S6_fjEvET0_T1_DpT2__2 "_Z33__regcall3____builtin_invoke_simdILb1EfPFNSt12experimental4simdIfNS0_10__simd_abiILNS0_12_StorageKindE2ELi16EEEEERFS5_PfS5_iES6_S5_jEJPS7_S6_fjEvET0_T1_DpT2__2" ; 0x00000288
OpName %_Z33__regcall3____builtin_invoke_simdILb1EfPFNSt12experimental4simdIfNS0_10__simd_abiILNS0_12_StorageKindE2ELi16EEEEERFS5_PfS5_iES6_S5_jEJPS7_S6_fjEvET0_T1_DpT2__4 "_Z33__regcall3____builtin_invoke_simdILb1EfPFNSt12experimental4simdIfNS0_10__simd_abiILNS0_12_StorageKindE2ELi16EEEEERFS5_PfS5_iES6_S5_jEJPS7_S6_fjEvET0_T1_DpT2__4" ; 0x00000334
OpName %_Z33__regcall3____builtin_invoke_simdILb1EvPFvRFvPfNSt12experimental4simdIfNS1_10__simd_abiILNS1_12_StorageKindE2ELi16EEEEEiES0_S6_jEJPS7_S0_fjEvET0_T1_DpT2__6 "_Z33__regcall3____builtin_invoke_simdILb1EvPFvRFvPfNSt12experimental4simdIfNS1_10__simd_abiILNS1_12_StorageKindE2ELi16EEEEEiES0_S6_jEJPS7_S0_fjEvET0_T1_DpT2__6" ; 0x000003e0
OpName %llvm_genx_svm_block_ld_unaligned_v16f32_i64 "llvm.genx.svm.block.ld.unaligned.v16f32.i64" ; 0x00000488
OpName %__itt_offload_wi_start_wrapper "__itt_offload_wi_start_wrapper" ; 0x000004bc
OpName %_ZN4sycl3_V13ext6oneapi12experimental6detail33__regcall3__simd_func_call_helperILi16ERFNSt12experimental4simdIfNS6_10__simd_abiILNS6_12_StorageKindE2ELi16EEEEEPfSB_iEJNS3_7uniformISC_EEfNSF_IjEEEEENSt13invoke_resultIT0_JDpNS4_9spmd2simdIT1_XT_EvE4typeEEE4typeESJ_SO__3 "_ZN4sycl3_V13ext6oneapi12experimental6detail33__regcall3__simd_func_call_helperILi16ERFNSt12experimental4simdIfNS6_10__simd_abiILNS6_12_StorageKindE2ELi16EEEEEPfSB_iEJNS3_7uniformISC_EEfNSF_IjEEEEENSt13invoke_resultIT0_JDpNS4_9spmd2simdIT1_XT_EvE4typeEEE4typeESJ_SO__3" ; 0x000004e4
OpName %_ZN4sycl3_V13ext6oneapi12experimental6detail33__regcall3__simd_func_call_helperILi16ERFNSt12experimental4simdIfNS6_10__simd_abiILNS6_12_StorageKindE2ELi16EEEEEPfSB_iEJNS3_7uniformISC_EEfNSF_IjEEEEENSt13invoke_resultIT0_JDpNS4_9spmd2simdIT1_XT_EvE4typeEEE4typeESJ_SO__1 "_ZN4sycl3_V13ext6oneapi12experimental6detail33__regcall3__simd_func_call_helperILi16ERFNSt12experimental4simdIfNS6_10__simd_abiILNS6_12_StorageKindE2ELi16EEEEEPfSB_iEJNS3_7uniformISC_EEfNSF_IjEEEEENSt13invoke_resultIT0_JDpNS4_9spmd2simdIT1_XT_EvE4typeEEE4typeESJ_SO__1" ; 0x000005fc
OpName %_ZN4sycl3_V13ext6oneapi12experimental6detail33__regcall3__simd_func_call_helperILi16ERFvPfNSt12experimental4simdIfNS7_10__simd_abiILNS7_12_StorageKindE2ELi16EEEEEiEJNS3_7uniformIS6_EEfNSF_IjEEEEENSt13invoke_resultIT0_JDpNS4_9spmd2simdIT1_XT_EvE4typeEEE4typeESJ_SO__5 "_ZN4sycl3_V13ext6oneapi12experimental6detail33__regcall3__simd_func_call_helperILi16ERFvPfNSt12experimental4simdIfNS7_10__simd_abiILNS7_12_StorageKindE2ELi16EEEEEiEJNS3_7uniformIS6_EEfNSF_IjEEEEENSt13invoke_resultIT0_JDpNS4_9spmd2simdIT1_XT_EvE4typeEEE4typeESJ_SO__5" ; 0x00000714
OpName %__itt_offload_wi_finish_wrapper "__itt_offload_wi_finish_wrapper" ; 0x00000828
OpName %__itt_offload_wi_start_stub "__itt_offload_wi_start_stub" ; 0x00000850
OpName %__itt_offload_wi_finish_stub "__itt_offload_wi_finish_stub" ; 0x00000874
OpName %_Z24__regcall3__SIMD_CALLEE1PfNSt12experimental4simdIfNS0_10__simd_abiILNS0_12_StorageKindE2ELi16EEEEEi "_Z24__regcall3__SIMD_CALLEE1PfNSt12experimental4simdIfNS0_10__simd_abiILNS0_12_StorageKindE2ELi16EEEEEi" ; 0x0000089c
OpName %_esimd ".esimd" ; 0x0000090c
OpName %_Z24__regcall3__SIMD_CALLEE2PfNSt12experimental4simdIfNS0_10__simd_abiILNS0_12_StorageKindE2ELi16EEEEEi "_Z24__regcall3__SIMD_CALLEE2PfNSt12experimental4simdIfNS0_10__simd_abiILNS0_12_StorageKindE2ELi16EEEEEi" ; 0x0000091c
OpName %_esimd_0 ".esimd" ; 0x0000098c
OpName %_Z28__regcall3__SIMD_CALLEE_VOIDPfNSt12experimental4simdIfNS0_10__simd_abiILNS0_12_StorageKindE2ELi16EEEEEi "_Z28__regcall3__SIMD_CALLEE_VOIDPfNSt12experimental4simdIfNS0_10__simd_abiILNS0_12_StorageKindE2ELi16EEEEEi" ; 0x0000099c
OpName %_esimd_1 ".esimd" ; 0x00000a10
OpName %_esimd_i ".esimd.i" ; 0x00000a20
OpName %_esimd_i_0 ".esimd.i" ; 0x00000a34
OpName %_esimd_i_1 ".esimd.i" ; 0x00000a48
%52 = OpAliasDomainDeclINTEL ; 0x00000a5c
%53 = OpAliasScopeDeclINTEL %52 ; 0x00000a64
%54 = OpAliasDomainDeclINTEL ; 0x00000a70
%55 = OpAliasScopeDeclINTEL %54 ; 0x00000a78
%56 = OpAliasDomainDeclINTEL ; 0x00000a84
%57 = OpAliasScopeDeclINTEL %56 ; 0x00000a8c
%58 = OpAliasScopeListDeclINTEL %53 %55 %57 ; 0x00000a98
%61 = OpAliasDomainDeclINTEL ; 0x00000aac
%62 = OpAliasScopeDeclINTEL %61 ; 0x00000ab4
%63 = OpAliasScopeListDeclINTEL %62 ; 0x00000ac0
%71 = OpAliasDomainDeclINTEL ; 0x00000acc
%72 = OpAliasScopeDeclINTEL %71 ; 0x00000ad4
%73 = OpAliasScopeListDeclINTEL %72 ; 0x00000ae0
OpDecorate %__spirv_BuiltInSubgroupId LinkageAttributes "__spirv_BuiltInSubgroupId" Import ; 0x00000aec
OpDecorate %__spirv_BuiltInSubgroupId Constant ; 0x00000b18
OpDecorate %__spirv_BuiltInSubgroupId BuiltIn SubgroupId ; 0x00000b24
OpDecorate %__spirv_BuiltInSubgroupId Alignment 4 ; 0x00000b34
OpDecorate %__spirv_BuiltInSubgroupLocalInvocationId LinkageAttributes "__spirv_BuiltInSubgroupLocalInvocationId" Import ; 0x00000b44
OpDecorate %__spirv_BuiltInSubgroupLocalInvocationId Constant ; 0x00000b80
OpDecorate %__spirv_BuiltInSubgroupLocalInvocationId BuiltIn SubgroupLocalInvocationId ; 0x00000b8c
OpDecorate %__spirv_BuiltInSubgroupLocalInvocationId Alignment 4 ; 0x00000b9c
OpDecorate %__spirv_BuiltInWorkgroupId LinkageAttributes "__spirv_BuiltInWorkgroupId" Import ; 0x00000bac
OpDecorate %__spirv_BuiltInWorkgroupId Constant ; 0x00000bd8
OpDecorate %__spirv_BuiltInWorkgroupId BuiltIn WorkgroupId ; 0x00000be4
OpDecorate %__spirv_BuiltInWorkgroupId Alignment 32 ; 0x00000bf4
OpDecorate %__spirv_BuiltInGlobalLinearId LinkageAttributes "__spirv_BuiltInGlobalLinearId" Import ; 0x00000c04
OpDecorate %__spirv_BuiltInGlobalLinearId Constant ; 0x00000c34
OpDecorate %__spirv_BuiltInGlobalLinearId BuiltIn GlobalLinearId ; 0x00000c40
OpDecorate %__spirv_BuiltInGlobalLinearId Alignment 8 ; 0x00000c50
OpDecorate %__spirv_BuiltInWorkgroupSize LinkageAttributes "__spirv_BuiltInWorkgroupSize" Import ; 0x00000c60
OpDecorate %__spirv_BuiltInWorkgroupSize Constant ; 0x00000c90
OpDecorate %__spirv_BuiltInWorkgroupSize BuiltIn WorkgroupSize ; 0x00000c9c
OpDecorate %__spirv_BuiltInWorkgroupSize Alignment 32 ; 0x00000cac
OpDecorate %_Z33__regcall3____builtin_invoke_simdILb1EfPFNSt12experimental4simdIfNS0_10__simd_abiILNS0_12_StorageKindE2ELi16EEEEERFS5_PfS5_iES6_S5_jEJPS7_S6_fjEvET0_T1_DpT2__2 LinkageAttributes "_Z33__regcall3____builtin_invoke_simdILb1EfPFNSt12experimental4simdIfNS0_10__simd_abiILNS0_12_StorageKindE2ELi16EEEEERFS5_PfS5_iES6_S5_jEJPS7_S6_fjEvET0_T1_DpT2__2" Import ; 0x00000cbc
OpDecorate %_Z33__regcall3____builtin_invoke_simdILb1EfPFNSt12experimental4simdIfNS0_10__simd_abiILNS0_12_StorageKindE2ELi16EEEEERFS5_PfS5_iES6_S5_jEJPS7_S6_fjEvET0_T1_DpT2__4 LinkageAttributes "_Z33__regcall3____builtin_invoke_simdILb1EfPFNSt12experimental4simdIfNS0_10__simd_abiILNS0_12_StorageKindE2ELi16EEEEERFS5_PfS5_iES6_S5_jEJPS7_S6_fjEvET0_T1_DpT2__4" Import ; 0x00000d70
OpDecorate %_Z33__regcall3____builtin_invoke_simdILb1EvPFvRFvPfNSt12experimental4simdIfNS1_10__simd_abiILNS1_12_StorageKindE2ELi16EEEEEiES0_S6_jEJPS7_S0_fjEvET0_T1_DpT2__6 LinkageAttributes "_Z33__regcall3____builtin_invoke_simdILb1EvPFvRFvPfNSt12experimental4simdIfNS1_10__simd_abiILNS1_12_StorageKindE2ELi16EEEEEiES0_S6_jEJPS7_S0_fjEvET0_T1_DpT2__6" Import ; 0x00000e24
OpDecorate %llvm_genx_svm_block_ld_unaligned_v16f32_i64 LinkageAttributes "llvm.genx.svm.block.ld.unaligned.v16f32.i64" Import ; 0x00000ed4
OpDecorate %llvm_genx_svm_block_ld_unaligned_v16f32_i64 VectorComputeFunctionINTEL ; 0x00000f10
OpDecorate %44 Alignment 4 ; 0x00000f1c
OpDecorate %45 Alignment 4 ; 0x00000f2c
OpDecorate %46 Alignment 4 ; 0x00000f3c
OpDecorate %__itt_offload_wi_start_wrapper LinkageAttributes "__itt_offload_wi_start_wrapper" Export ; 0x00000f4c
OpDecorate %77 NoSignedWrap ; 0x00000f7c
OpDecorate %77 NoUnsignedWrap ; 0x00000f88
OpDecorate %_ZN4sycl3_V13ext6oneapi12experimental6detail33__regcall3__simd_func_call_helperILi16ERFNSt12experimental4simdIfNS6_10__simd_abiILNS6_12_StorageKindE2ELi16EEEEEPfSB_iEJNS3_7uniformISC_EEfNSF_IjEEEEENSt13invoke_resultIT0_JDpNS4_9spmd2simdIT1_XT_EvE4typeEEE4typeESJ_SO__3 LinkageAttributes "_ZN4sycl3_V13ext6oneapi12experimental6detail33__regcall3__simd_func_call_helperILi16ERFNSt12experimental4simdIfNS6_10__simd_abiILNS6_12_StorageKindE2ELi16EEEEEPfSB_iEJNS3_7uniformISC_EEfNSF_IjEEEEENSt13invoke_resultIT0_JDpNS4_9spmd2simdIT1_XT_EvE4typeEEE4typeESJ_SO__3" Export ; 0x00000f94
OpDecorate %_ZN4sycl3_V13ext6oneapi12experimental6detail33__regcall3__simd_func_call_helperILi16ERFNSt12experimental4simdIfNS6_10__simd_abiILNS6_12_StorageKindE2ELi16EEEEEPfSB_iEJNS3_7uniformISC_EEfNSF_IjEEEEENSt13invoke_resultIT0_JDpNS4_9spmd2simdIT1_XT_EvE4typeEEE4typeESJ_SO__3 ReferencedIndirectlyINTEL ; 0x000010b4
OpDecorate %_ZN4sycl3_V13ext6oneapi12experimental6detail33__regcall3__simd_func_call_helperILi16ERFNSt12experimental4simdIfNS6_10__simd_abiILNS6_12_StorageKindE2ELi16EEEEEPfSB_iEJNS3_7uniformISC_EEfNSF_IjEEEEENSt13invoke_resultIT0_JDpNS4_9spmd2simdIT1_XT_EvE4typeEEE4typeESJ_SO__3 StackCallINTEL ; 0x000010c0
OpDecorate %_ZN4sycl3_V13ext6oneapi12experimental6detail33__regcall3__simd_func_call_helperILi16ERFNSt12experimental4simdIfNS6_10__simd_abiILNS6_12_StorageKindE2ELi16EEEEEPfSB_iEJNS3_7uniformISC_EEfNSF_IjEEEEENSt13invoke_resultIT0_JDpNS4_9spmd2simdIT1_XT_EvE4typeEEE4typeESJ_SO__3 VectorComputeFunctionINTEL ; 0x000010cc
OpDecorate %_ZN4sycl3_V13ext6oneapi12experimental6detail33__regcall3__simd_func_call_helperILi16ERFNSt12experimental4simdIfNS6_10__simd_abiILNS6_12_StorageKindE2ELi16EEEEEPfSB_iEJNS3_7uniformISC_EEfNSF_IjEEEEENSt13invoke_resultIT0_JDpNS4_9spmd2simdIT1_XT_EvE4typeEEE4typeESJ_SO__1 LinkageAttributes "_ZN4sycl3_V13ext6oneapi12experimental6detail33__regcall3__simd_func_call_helperILi16ERFNSt12experimental4simdIfNS6_10__simd_abiILNS6_12_StorageKindE2ELi16EEEEEPfSB_iEJNS3_7uniformISC_EEfNSF_IjEEEEENSt13invoke_resultIT0_JDpNS4_9spmd2simdIT1_XT_EvE4typeEEE4typeESJ_SO__1" Export ; 0x000010d8
OpDecorate %_ZN4sycl3_V13ext6oneapi12experimental6detail33__regcall3__simd_func_call_helperILi16ERFNSt12experimental4simdIfNS6_10__simd_abiILNS6_12_StorageKindE2ELi16EEEEEPfSB_iEJNS3_7uniformISC_EEfNSF_IjEEEEENSt13invoke_resultIT0_JDpNS4_9spmd2simdIT1_XT_EvE4typeEEE4typeESJ_SO__1 ReferencedIndirectlyINTEL ; 0x000011f8
OpDecorate %_ZN4sycl3_V13ext6oneapi12experimental6detail33__regcall3__simd_func_call_helperILi16ERFNSt12experimental4simdIfNS6_10__simd_abiILNS6_12_StorageKindE2ELi16EEEEEPfSB_iEJNS3_7uniformISC_EEfNSF_IjEEEEENSt13invoke_resultIT0_JDpNS4_9spmd2simdIT1_XT_EvE4typeEEE4typeESJ_SO__1 StackCallINTEL ; 0x00001204
OpDecorate %_ZN4sycl3_V13ext6oneapi12experimental6detail33__regcall3__simd_func_call_helperILi16ERFNSt12experimental4simdIfNS6_10__simd_abiILNS6_12_StorageKindE2ELi16EEEEEPfSB_iEJNS3_7uniformISC_EEfNSF_IjEEEEENSt13invoke_resultIT0_JDpNS4_9spmd2simdIT1_XT_EvE4typeEEE4typeESJ_SO__1 VectorComputeFunctionINTEL ; 0x00001210
OpDecorate %94 FPFastMathMode Fast ; 0x0000121c
OpDecorate %_ZN4sycl3_V13ext6oneapi12experimental6detail33__regcall3__simd_func_call_helperILi16ERFvPfNSt12experimental4simdIfNS7_10__simd_abiILNS7_12_StorageKindE2ELi16EEEEEiEJNS3_7uniformIS6_EEfNSF_IjEEEEENSt13invoke_resultIT0_JDpNS4_9spmd2simdIT1_XT_EvE4typeEEE4typeESJ_SO__5 LinkageAttributes "_ZN4sycl3_V13ext6oneapi12experimental6detail33__regcall3__simd_func_call_helperILi16ERFvPfNSt12experimental4simdIfNS7_10__simd_abiILNS7_12_StorageKindE2ELi16EEEEEiEJNS3_7uniformIS6_EEfNSF_IjEEEEENSt13invoke_resultIT0_JDpNS4_9spmd2simdIT1_XT_EvE4typeEEE4typeESJ_SO__5" Export ; 0x0000122c
OpDecorate %_ZN4sycl3_V13ext6oneapi12experimental6detail33__regcall3__simd_func_call_helperILi16ERFvPfNSt12experimental4simdIfNS7_10__simd_abiILNS7_12_StorageKindE2ELi16EEEEEiEJNS3_7uniformIS6_EEfNSF_IjEEEEENSt13invoke_resultIT0_JDpNS4_9spmd2simdIT1_XT_EvE4typeEEE4typeESJ_SO__5 ReferencedIndirectlyINTEL ; 0x00001348
OpDecorate %_ZN4sycl3_V13ext6oneapi12experimental6detail33__regcall3__simd_func_call_helperILi16ERFvPfNSt12experimental4simdIfNS7_10__simd_abiILNS7_12_StorageKindE2ELi16EEEEEiEJNS3_7uniformIS6_EEfNSF_IjEEEEENSt13invoke_resultIT0_JDpNS4_9spmd2simdIT1_XT_EvE4typeEEE4typeESJ_SO__5 StackCallINTEL ; 0x00001354
OpDecorate %_ZN4sycl3_V13ext6oneapi12experimental6detail33__regcall3__simd_func_call_helperILi16ERFvPfNSt12experimental4simdIfNS7_10__simd_abiILNS7_12_StorageKindE2ELi16EEEEEiEJNS3_7uniformIS6_EEfNSF_IjEEEEENSt13invoke_resultIT0_JDpNS4_9spmd2simdIT1_XT_EvE4typeEEE4typeESJ_SO__5 VectorComputeFunctionINTEL ; 0x00001360
OpDecorate %__itt_offload_wi_finish_wrapper LinkageAttributes "__itt_offload_wi_finish_wrapper" Export ; 0x0000136c
OpDecorate %110 Alignment 8 ; 0x0000139c
OpDecorate %112 SpecId 4285822057 ; 0x000013ac
OpDecorate %__itt_offload_wi_start_stub LinkageAttributes "__itt_offload_wi_start_stub" Export ; 0x000013bc
OpDecorate %147 Alignment 8 ; 0x000013e8
OpDecorate %148 SpecId 4285822057 ; 0x000013f8
OpDecorate %__itt_offload_wi_finish_stub LinkageAttributes "__itt_offload_wi_finish_stub" Export ; 0x00001408
OpDecorate %167 Alignment 8 ; 0x00001438
OpDecorate %168 Alignment 8 ; 0x00001448
OpDecorate %170 Alignment 4 ; 0x00001458
OpDecorate %177 Alignment 8 ; 0x00001468
OpDecorate %178 Alignment 8 ; 0x00001478
OpDecorate %_Z24__regcall3__SIMD_CALLEE1PfNSt12experimental4simdIfNS0_10__simd_abiILNS0_12_StorageKindE2ELi16EEEEEi LinkageAttributes "_Z24__regcall3__SIMD_CALLEE1PfNSt12experimental4simdIfNS0_10__simd_abiILNS0_12_StorageKindE2ELi16EEEEEi" Export ; 0x00001488
OpDecorate %_Z24__regcall3__SIMD_CALLEE1PfNSt12experimental4simdIfNS0_10__simd_abiILNS0_12_StorageKindE2ELi16EEEEEi ReferencedIndirectlyINTEL ; 0x00001500
OpDecorate %_Z24__regcall3__SIMD_CALLEE1PfNSt12experimental4simdIfNS0_10__simd_abiILNS0_12_StorageKindE2ELi16EEEEEi VectorComputeFunctionINTEL ; 0x0000150c
OpDecorate %190 FPFastMathMode Fast ; 0x00001518
OpDecorate %_Z24__regcall3__SIMD_CALLEE2PfNSt12experimental4simdIfNS0_10__simd_abiILNS0_12_StorageKindE2ELi16EEEEEi LinkageAttributes "_Z24__regcall3__SIMD_CALLEE2PfNSt12experimental4simdIfNS0_10__simd_abiILNS0_12_StorageKindE2ELi16EEEEEi" Export ; 0x00001528
OpDecorate %_Z24__regcall3__SIMD_CALLEE2PfNSt12experimental4simdIfNS0_10__simd_abiILNS0_12_StorageKindE2ELi16EEEEEi ReferencedIndirectlyINTEL ; 0x000015a0
OpDecorate %_Z24__regcall3__SIMD_CALLEE2PfNSt12experimental4simdIfNS0_10__simd_abiILNS0_12_StorageKindE2ELi16EEEEEi VectorComputeFunctionINTEL ; 0x000015ac
OpDecorate %200 FPFastMathMode Fast ; 0x000015b8
OpDecorate %201 FPFastMathMode Fast ; 0x000015c8
OpDecorate %_Z28__regcall3__SIMD_CALLEE_VOIDPfNSt12experimental4simdIfNS0_10__simd_abiILNS0_12_StorageKindE2ELi16EEEEEi LinkageAttributes "_Z28__regcall3__SIMD_CALLEE_VOIDPfNSt12experimental4simdIfNS0_10__simd_abiILNS0_12_StorageKindE2ELi16EEEEEi" Export ; 0x000015d8
OpDecorate %_Z28__regcall3__SIMD_CALLEE_VOIDPfNSt12experimental4simdIfNS0_10__simd_abiILNS0_12_StorageKindE2ELi16EEEEEi ReferencedIndirectlyINTEL ; 0x00001654
OpDecorate %_Z28__regcall3__SIMD_CALLEE_VOIDPfNSt12experimental4simdIfNS0_10__simd_abiILNS0_12_StorageKindE2ELi16EEEEEi VectorComputeFunctionINTEL ; 0x00001660
OpDecorate %216 FPFastMathMode Fast ; 0x0000166c
OpDecorate %217 FPFastMathMode Fast ; 0x0000167c
OpDecorate %223 FPFastMathMode Fast ; 0x0000168c
%uint = OpTypeInt 32 0 ; 0x0000169c
%ulong = OpTypeInt 64 0 ; 0x000016ac
%uchar = OpTypeInt 8 0 ; 0x000016bc
%uint_4 = OpConstant %uint 4 ; 0x000016cc
%uint_6 = OpConstant %uint 6 ; 0x000016dc
%ulong_2147483648 = OpConstant %ulong 2147483648 ; 0x000016ec
%ulong_3 = OpConstant %ulong 3 ; 0x00001700
%112 = OpSpecConstant %uchar 0 ; 0x00001714
%uchar_0 = OpConstant %uchar 0 ; 0x00001724
%ulong_0 = OpConstant %ulong 0 ; 0x00001734
%ulong_1 = OpConstant %ulong 1 ; 0x00001748
%ulong_2 = OpConstant %ulong 2 ; 0x0000175c
%148 = OpSpecConstant %uchar 0 ; 0x00001770
%_ptr_CrossWorkgroup_uint = OpTypePointer CrossWorkgroup %uint ; 0x00001780
%v3ulong = OpTypeVector %ulong 3 ; 0x00001790
%_ptr_CrossWorkgroup_v3ulong = OpTypePointer CrossWorkgroup %v3ulong ; 0x000017a0
%_ptr_CrossWorkgroup_ulong = OpTypePointer CrossWorkgroup %ulong ; 0x000017b0
%float = OpTypeFloat 32 ; 0x000017c0
%v16float = OpTypeVector %float 16 ; 0x000017cc
%_ptr_Generic_float = OpTypePointer Generic %float ; 0x000017dc
%16 = OpTypeFunction %v16float %_ptr_Generic_float %v16float %uint ; 0x000017ec
%_ptr_Function_16 = OpTypePointer Function %16 ; 0x00001804
%18 = OpTypeFunction %float %_ptr_Function_16 %_ptr_Generic_float %float %uint ; 0x00001814
%void = OpTypeVoid ; 0x00001830
%30 = OpTypeFunction %void %_ptr_Generic_float %v16float %uint ; 0x00001838
%_ptr_Function_30 = OpTypePointer Function %30 ; 0x00001850
%32 = OpTypeFunction %void %_ptr_Function_30 %_ptr_Generic_float %float %uint ; 0x00001860
%38 = OpTypeFunction %v16float %ulong ; 0x0000187c
%_ptr_CrossWorkgroup_float = OpTypePointer CrossWorkgroup %float ; 0x0000188c
%42 = OpTypeFunction %void %_ptr_CrossWorkgroup_float %_ptr_CrossWorkgroup_float %_ptr_CrossWorkgroup_float ; 0x0000189c
%48 = OpTypeFunction %void ; 0x000018b4
%bool = OpTypeBool ; 0x000018c0
%_arr_ulong_ulong_3 = OpTypeArray %ulong %ulong_3 ; 0x000018c8
%_ptr_Function__arr_ulong_ulong_3 = OpTypePointer Function %_arr_ulong_ulong_3 ; 0x000018d8
%_ptr_Function_uchar = OpTypePointer Function %uchar ; 0x000018e8
%_ptr_Function_ulong = OpTypePointer Function %ulong ; 0x000018f8
%_ptr_Generic_ulong = OpTypePointer Generic %ulong ; 0x00001908
%138 = OpTypeFunction %void %_ptr_Generic_ulong %ulong %uint ; 0x00001918
%160 = OpTypeFunction %void %_ptr_Generic_ulong %ulong ; 0x00001930
%_ptr_Function__ptr_Generic_ulong = OpTypePointer Function %_ptr_Generic_ulong ; 0x00001944
%_ptr_Function_uint = OpTypePointer Function %uint ; 0x00001954
%_ptr_Generic__ptr_Generic_ulong = OpTypePointer Generic %_ptr_Generic_ulong ; 0x00001964
%_ptr_Generic_uint = OpTypePointer Generic %uint ; 0x00001974
%__spirv_BuiltInSubgroupId = OpVariable %_ptr_CrossWorkgroup_uint CrossWorkgroup ; 0x00001984
%__spirv_BuiltInSubgroupLocalInvocationId = OpVariable %_ptr_CrossWorkgroup_uint CrossWorkgroup ; 0x00001994
%__spirv_BuiltInWorkgroupId = OpVariable %_ptr_CrossWorkgroup_v3ulong CrossWorkgroup ; 0x000019a4
%__spirv_BuiltInGlobalLinearId = OpVariable %_ptr_CrossWorkgroup_ulong CrossWorkgroup ; 0x000019b4
%__spirv_BuiltInWorkgroupSize = OpVariable %_ptr_CrossWorkgroup_v3ulong CrossWorkgroup ; 0x000019c4
%_Z33__regcall3____builtin_invoke_simdILb1EfPFNSt12experimental4simdIfNS0_10__simd_abiILNS0_12_StorageKindE2ELi16EEEEERFS5_PfS5_iES6_S5_jEJPS7_S6_fjEvET0_T1_DpT2__2 = OpFunction %float None %18 ; 0x000019d4
%20 = OpFunctionParameter %_ptr_Function_16 ; 0x000019e8
%21 = OpFunctionParameter %_ptr_Generic_float ; 0x000019f4
%22 = OpFunctionParameter %float ; 0x00001a00
%23 = OpFunctionParameter %uint ; 0x00001a0c
OpFunctionEnd ; 0x00001a18
%_Z33__regcall3____builtin_invoke_simdILb1EfPFNSt12experimental4simdIfNS0_10__simd_abiILNS0_12_StorageKindE2ELi16EEEEERFS5_PfS5_iES6_S5_jEJPS7_S6_fjEvET0_T1_DpT2__4 = OpFunction %float None %18 ; 0x00001a1c
%25 = OpFunctionParameter %_ptr_Function_16 ; 0x00001a30
%26 = OpFunctionParameter %_ptr_Generic_float ; 0x00001a3c
%27 = OpFunctionParameter %float ; 0x00001a48
%28 = OpFunctionParameter %uint ; 0x00001a54
OpFunctionEnd ; 0x00001a60
%_Z33__regcall3____builtin_invoke_simdILb1EvPFvRFvPfNSt12experimental4simdIfNS1_10__simd_abiILNS1_12_StorageKindE2ELi16EEEEEiES0_S6_jEJPS7_S0_fjEvET0_T1_DpT2__6 = OpFunction %void None %32 ; 0x00001a64
%34 = OpFunctionParameter %_ptr_Function_30 ; 0x00001a78
%35 = OpFunctionParameter %_ptr_Generic_float ; 0x00001a84
%36 = OpFunctionParameter %float ; 0x00001a90
%37 = OpFunctionParameter %uint ; 0x00001a9c
OpFunctionEnd ; 0x00001aa8
%llvm_genx_svm_block_ld_unaligned_v16f32_i64 = OpFunction %v16float Inline %38 ; 0x00001aac
%40 = OpFunctionParameter %ulong ; 0x00001ac0
OpFunctionEnd ; 0x00001acc
%43 = OpFunction %void None %42 ; 0x00001ad0
%44 = OpFunctionParameter %_ptr_CrossWorkgroup_float ; 0x00001ae4
%45 = OpFunctionParameter %_ptr_CrossWorkgroup_float ; 0x00001af0
%46 = OpFunctionParameter %_ptr_CrossWorkgroup_float ; 0x00001afc
%47 = OpLabel ; 0x00001b08
%50 = OpFunctionCall %void %__itt_offload_wi_start_wrapper ; 0x00001b10
%51 = OpPtrCastToGeneric %_ptr_Generic_float %44 ; 0x00001b20
%59 = OpLoad %v3ulong %__spirv_BuiltInWorkgroupId Aligned|NoAliasINTELMask 32 %58 ; 0x00001b30
%60 = OpCompositeExtract %ulong %59 0 ; 0x00001b4c
%64 = OpLoad %uint %__spirv_BuiltInSubgroupId Aligned|NoAliasINTELMask 4 %63 ; 0x00001b60
%66 = OpShiftLeftLogical %uint %64 %uint_4 ; 0x00001b7c
%67 = OpUConvert %uint %60 ; 0x00001b90
%69 = OpShiftLeftLogical %uint %67 %uint_6 ; 0x00001ba0
%70 = OpIAdd %uint %69 %66 ; 0x00001bb4
%74 = OpLoad %uint %__spirv_BuiltInSubgroupLocalInvocationId Aligned|NoAliasINTELMask 4 %73 ; 0x00001bc8
%75 = OpUConvert %ulong %74 ; 0x00001be4
%76 = OpUConvert %ulong %70 ; 0x00001bf4
%77 = OpIAdd %ulong %75 %76 ; 0x00001c04
%80 = OpULessThan %bool %77 %ulong_2147483648 ; 0x00001c18
OpAssumeTrueKHR %80 ; 0x00001c2c
%81 = OpInBoundsPtrAccessChain %_ptr_CrossWorkgroup_float %45 %77 ; 0x00001c34
%82 = OpLoad %float %81 Aligned 4 ; 0x00001c48
%87 = OpFunctionCall %float %_Z33__regcall3____builtin_invoke_simdILb1EfPFNSt12experimental4simdIfNS0_10__simd_abiILNS0_12_StorageKindE2ELi16EEEEERFS5_PfS5_iES6_S5_jEJPS7_S6_fjEvET0_T1_DpT2__4 %_ZN4sycl3_V13ext6oneapi12experimental6detail33__regcall3__simd_func_call_helperILi16ERFNSt12experimental4simdIfNS6_10__simd_abiILNS6_12_StorageKindE2ELi16EEEEEPfSB_iEJNS3_7uniformISC_EEfNSF_IjEEEEENSt13invoke_resultIT0_JDpNS4_9spmd2simdIT1_XT_EvE4typeEEE4typeESJ_SO__3 %51 %82 %70 ; 0x00001c60
%88 = OpLoad %float %81 Aligned 4 ; 0x00001c80
%93 = OpFunctionCall %float %_Z33__regcall3____builtin_invoke_simdILb1EfPFNSt12experimental4simdIfNS0_10__simd_abiILNS0_12_StorageKindE2ELi16EEEEERFS5_PfS5_iES6_S5_jEJPS7_S6_fjEvET0_T1_DpT2__2 %_ZN4sycl3_V13ext6oneapi12experimental6detail33__regcall3__simd_func_call_helperILi16ERFNSt12experimental4simdIfNS6_10__simd_abiILNS6_12_StorageKindE2ELi16EEEEEPfSB_iEJNS3_7uniformISC_EEfNSF_IjEEEEENSt13invoke_resultIT0_JDpNS4_9spmd2simdIT1_XT_EvE4typeEEE4typeESJ_SO__1 %51 %88 %70 ; 0x00001c98
%94 = OpFAdd %float %87 %93 ; 0x00001cb8
%95 = OpLoad %float %81 Aligned 4 ; 0x00001ccc
%100 = OpFunctionCall %void %_Z33__regcall3____builtin_invoke_simdILb1EvPFvRFvPfNSt12experimental4simdIfNS1_10__simd_abiILNS1_12_StorageKindE2ELi16EEEEEiES0_S6_jEJPS7_S0_fjEvET0_T1_DpT2__6 %_ZN4sycl3_V13ext6oneapi12experimental6detail33__regcall3__simd_func_call_helperILi16ERFvPfNSt12experimental4simdIfNS7_10__simd_abiILNS7_12_StorageKindE2ELi16EEEEEiEJNS3_7uniformIS6_EEfNSF_IjEEEEENSt13invoke_resultIT0_JDpNS4_9spmd2simdIT1_XT_EvE4typeEEE4typeESJ_SO__5 %51 %95 %70 ; 0x00001ce4
%101 = OpInBoundsPtrAccessChain %_ptr_CrossWorkgroup_float %46 %77 ; 0x00001d04
OpStore %101 %94 Aligned 4 ; 0x00001d18
%103 = OpFunctionCall %void %__itt_offload_wi_finish_wrapper ; 0x00001d2c
OpReturn ; 0x00001d3c
OpFunctionEnd ; 0x00001d40
%__itt_offload_wi_start_wrapper = OpFunction %void Inline %48 ; 0x00001d44
%104 = OpLabel ; 0x00001d58
%110 = OpVariable %_ptr_Function__arr_ulong_ulong_3 Function ; 0x00001d60
%114 = OpIEqual %bool %112 %uchar_0 ; 0x00001d70
OpBranchConditional %114 %106 %105 ; 0x00001d84
%105 = OpLabel ; 0x00001d94
%116 = OpBitcast %_ptr_Function_uchar %110 ; 0x00001d9c
OpLifetimeStart %116 24 ; 0x00001dac
%119 = OpInBoundsPtrAccessChain %_ptr_Function_ulong %110 %ulong_0 %ulong_0 ; 0x00001db8
%121 = OpPtrCastToGeneric %_ptr_Generic_ulong %119 ; 0x00001dd0
%122 = OpLoad %v3ulong %__spirv_BuiltInWorkgroupId Aligned 32 ; 0x00001de0
%123 = OpCompositeExtract %ulong %122 0 ; 0x00001df8
OpStore %119 %123 Aligned 8 ; 0x00001e0c
%125 = OpInBoundsPtrAccessChain %_ptr_Function_ulong %110 %ulong_0 %ulong_1 ; 0x00001e20
%126 = OpCompositeExtract %ulong %122 1 ; 0x00001e38
OpStore %125 %126 Aligned 8 ; 0x00001e4c
%128 = OpInBoundsPtrAccessChain %_ptr_Function_ulong %110 %ulong_0 %ulong_2 ; 0x00001e60
%129 = OpCompositeExtract %ulong %122 2 ; 0x00001e78
OpStore %128 %129 Aligned 8 ; 0x00001e8c
%130 = OpLoad %ulong %__spirv_BuiltInGlobalLinearId Aligned 8 ; 0x00001ea0
%131 = OpLoad %v3ulong %__spirv_BuiltInWorkgroupSize Aligned 32 ; 0x00001eb8
%132 = OpCompositeExtract %ulong %131 0 ; 0x00001ed0
%133 = OpCompositeExtract %ulong %131 1 ; 0x00001ee4
%134 = OpIMul %ulong %132 %133 ; 0x00001ef8
%135 = OpCompositeExtract %ulong %131 2 ; 0x00001f0c
%136 = OpIMul %ulong %134 %135 ; 0x00001f20
%137 = OpUConvert %uint %136 ; 0x00001f34
%143 = OpFunctionCall %void %__itt_offload_wi_start_stub %121 %130 %137 ; 0x00001f44
OpLifetimeStop %116 24 ; 0x00001f60
OpBranch %106 ; 0x00001f6c
%106 = OpLabel ; 0x00001f74
OpReturn ; 0x00001f7c
OpFunctionEnd ; 0x00001f80
%_ZN4sycl3_V13ext6oneapi12experimental6detail33__regcall3__simd_func_call_helperILi16ERFNSt12experimental4simdIfNS6_10__simd_abiILNS6_12_StorageKindE2ELi16EEEEEPfSB_iEJNS3_7uniformISC_EEfNSF_IjEEEEENSt13invoke_resultIT0_JDpNS4_9spmd2simdIT1_XT_EvE4typeEEE4typeESJ_SO__3 = OpFunction %v16float None %16 ; 0x00001f84
%84 = OpFunctionParameter %_ptr_Generic_float ; 0x00001f98
%85 = OpFunctionParameter %v16float ; 0x00001fa4
%86 = OpFunctionParameter %uint ; 0x00001fb0
%218 = OpLabel ; 0x00001fbc
%219 = OpSConvert %ulong %86 ; 0x00001fc4
%220 = OpInBoundsPtrAccessChain %_ptr_Generic_float %84 %219 ; 0x00001fd4
%221 = OpConvertPtrToU %ulong %220 ; 0x00001fe8
%_esimd_i_0 = OpFunctionCall %v16float %llvm_genx_svm_block_ld_unaligned_v16f32_i64 %221 ; 0x00001ff8
%223 = OpFAdd %v16float %_esimd_i_0 %85 ; 0x0000200c
OpReturnValue %223 ; 0x00002020
OpFunctionEnd ; 0x00002028
%_ZN4sycl3_V13ext6oneapi12experimental6detail33__regcall3__simd_func_call_helperILi16ERFNSt12experimental4simdIfNS6_10__simd_abiILNS6_12_StorageKindE2ELi16EEEEEPfSB_iEJNS3_7uniformISC_EEfNSF_IjEEEEENSt13invoke_resultIT0_JDpNS4_9spmd2simdIT1_XT_EvE4typeEEE4typeESJ_SO__1 = OpFunction %v16float None %16 ; 0x0000202c
%90 = OpFunctionParameter %_ptr_Generic_float ; 0x00002040
%91 = OpFunctionParameter %v16float ; 0x0000204c
%92 = OpFunctionParameter %uint ; 0x00002058
%211 = OpLabel ; 0x00002064
%212 = OpSConvert %ulong %92 ; 0x0000206c
%213 = OpInBoundsPtrAccessChain %_ptr_Generic_float %90 %212 ; 0x0000207c
%214 = OpConvertPtrToU %ulong %213 ; 0x00002090
%_esimd_i = OpFunctionCall %v16float %llvm_genx_svm_block_ld_unaligned_v16f32_i64 %214 ; 0x000020a0
%216 = OpFAdd %v16float %_esimd_i %91 ; 0x000020b4
%217 = OpFAdd %v16float %216 %216 ; 0x000020c8
OpReturnValue %217 ; 0x000020dc
OpFunctionEnd ; 0x000020e4
%_ZN4sycl3_V13ext6oneapi12experimental6detail33__regcall3__simd_func_call_helperILi16ERFvPfNSt12experimental4simdIfNS7_10__simd_abiILNS7_12_StorageKindE2ELi16EEEEEiEJNS3_7uniformIS6_EEfNSF_IjEEEEENSt13invoke_resultIT0_JDpNS4_9spmd2simdIT1_XT_EvE4typeEEE4typeESJ_SO__5 = OpFunction %void None %30 ; 0x000020e8
%97 = OpFunctionParameter %_ptr_Generic_float ; 0x000020fc
%98 = OpFunctionParameter %v16float ; 0x00002108
%99 = OpFunctionParameter %uint ; 0x00002114
%224 = OpLabel ; 0x00002120
%225 = OpSConvert %ulong %99 ; 0x00002128
%226 = OpInBoundsPtrAccessChain %_ptr_Generic_float %97 %225 ; 0x00002138
%227 = OpConvertPtrToU %ulong %226 ; 0x0000214c
%_esimd_i_1 = OpFunctionCall %v16float %llvm_genx_svm_block_ld_unaligned_v16f32_i64 %227 ; 0x0000215c
OpReturn ; 0x00002170
OpFunctionEnd ; 0x00002174
%__itt_offload_wi_finish_wrapper = OpFunction %void Inline %48 ; 0x00002178
%144 = OpLabel ; 0x0000218c
%147 = OpVariable %_ptr_Function__arr_ulong_ulong_3 Function ; 0x00002194
%149 = OpIEqual %bool %148 %uchar_0 ; 0x000021a4
OpBranchConditional %149 %146 %145 ; 0x000021b8
%145 = OpLabel ; 0x000021c8
%150 = OpBitcast %_ptr_Function_uchar %147 ; 0x000021d0
OpLifetimeStart %150 24 ; 0x000021e0
%151 = OpInBoundsPtrAccessChain %_ptr_Function_ulong %147 %ulong_0 %ulong_0 ; 0x000021ec
%152 = OpPtrCastToGeneric %_ptr_Generic_ulong %151 ; 0x00002204
%153 = OpLoad %v3ulong %__spirv_BuiltInWorkgroupId Aligned 32 ; 0x00002214
%154 = OpCompositeExtract %ulong %153 0 ; 0x0000222c
OpStore %151 %154 Aligned 8 ; 0x00002240
%155 = OpInBoundsPtrAccessChain %_ptr_Function_ulong %147 %ulong_0 %ulong_1 ; 0x00002254
%156 = OpCompositeExtract %ulong %153 1 ; 0x0000226c
OpStore %155 %156 Aligned 8 ; 0x00002280
%157 = OpInBoundsPtrAccessChain %_ptr_Function_ulong %147 %ulong_0 %ulong_2 ; 0x00002294
%158 = OpCompositeExtract %ulong %153 2 ; 0x000022ac
OpStore %157 %158 Aligned 8 ; 0x000022c0
%159 = OpLoad %ulong %__spirv_BuiltInGlobalLinearId Aligned 8 ; 0x000022d4
%164 = OpFunctionCall %void %__itt_offload_wi_finish_stub %152 %159 ; 0x000022ec
OpLifetimeStop %150 24 ; 0x00002304
OpBranch %146 ; 0x00002310
%146 = OpLabel ; 0x00002318
OpReturn ; 0x00002320
OpFunctionEnd ; 0x00002324
%__itt_offload_wi_start_stub = OpFunction %void DontInline|OptNoneINTEL %138 ; 0x00002328
%140 = OpFunctionParameter %_ptr_Generic_ulong ; 0x0000233c
%141 = OpFunctionParameter %ulong ; 0x00002348
%142 = OpFunctionParameter %uint ; 0x00002354
%165 = OpLabel ; 0x00002360
%167 = OpVariable %_ptr_Function__ptr_Generic_ulong Function ; 0x00002368
%168 = OpVariable %_ptr_Function_ulong Function ; 0x00002378
%170 = OpVariable %_ptr_Function_uint Function ; 0x00002388
%172 = OpPtrCastToGeneric %_ptr_Generic__ptr_Generic_ulong %167 ; 0x00002398
%173 = OpPtrCastToGeneric %_ptr_Generic_ulong %168 ; 0x000023a8
%175 = OpPtrCastToGeneric %_ptr_Generic_uint %170 ; 0x000023b8
OpStore %172 %140 Aligned 8 ; 0x000023c8
OpStore %173 %141 Aligned 8 ; 0x000023dc
OpStore %175 %142 Aligned 4 ; 0x000023f0
OpReturn ; 0x00002404
OpFunctionEnd ; 0x00002408
%__itt_offload_wi_finish_stub = OpFunction %void DontInline|OptNoneINTEL %160 ; 0x0000240c
%162 = OpFunctionParameter %_ptr_Generic_ulong ; 0x00002420
%163 = OpFunctionParameter %ulong ; 0x0000242c
%176 = OpLabel ; 0x00002438
%177 = OpVariable %_ptr_Function__ptr_Generic_ulong Function ; 0x00002440
%178 = OpVariable %_ptr_Function_ulong Function ; 0x00002450
%179 = OpPtrCastToGeneric %_ptr_Generic__ptr_Generic_ulong %177 ; 0x00002460
%180 = OpPtrCastToGeneric %_ptr_Generic_ulong %178 ; 0x00002470
OpStore %179 %162 Aligned 8 ; 0x00002480
OpStore %180 %163 Aligned 8 ; 0x00002494
OpReturn ; 0x000024a8
OpFunctionEnd ; 0x000024ac
%_Z24__regcall3__SIMD_CALLEE1PfNSt12experimental4simdIfNS0_10__simd_abiILNS0_12_StorageKindE2ELi16EEEEEi = OpFunction %v16float Inline %16 ; 0x000024b0
%182 = OpFunctionParameter %_ptr_Generic_float ; 0x000024c4
%183 = OpFunctionParameter %v16float ; 0x000024d0
%184 = OpFunctionParameter %uint ; 0x000024dc
%185 = OpLabel ; 0x000024e8
%186 = OpSConvert %ulong %184 ; 0x000024f0
%187 = OpInBoundsPtrAccessChain %_ptr_Generic_float %182 %186 ; 0x00002500
%188 = OpConvertPtrToU %ulong %187 ; 0x00002514
%_esimd = OpFunctionCall %v16float %llvm_genx_svm_block_ld_unaligned_v16f32_i64 %188 ; 0x00002524
%190 = OpFAdd %v16float %_esimd %183 ; 0x00002538
OpReturnValue %190 ; 0x0000254c
OpFunctionEnd ; 0x00002554
%_Z24__regcall3__SIMD_CALLEE2PfNSt12experimental4simdIfNS0_10__simd_abiILNS0_12_StorageKindE2ELi16EEEEEi = OpFunction %v16float Inline %16 ; 0x00002558
%192 = OpFunctionParameter %_ptr_Generic_float ; 0x0000256c
%193 = OpFunctionParameter %v16float ; 0x00002578
%194 = OpFunctionParameter %uint ; 0x00002584
%195 = OpLabel ; 0x00002590
%196 = OpSConvert %ulong %194 ; 0x00002598
%197 = OpInBoundsPtrAccessChain %_ptr_Generic_float %192 %196 ; 0x000025a8
%198 = OpConvertPtrToU %ulong %197 ; 0x000025bc
%_esimd_0 = OpFunctionCall %v16float %llvm_genx_svm_block_ld_unaligned_v16f32_i64 %198 ; 0x000025cc
%200 = OpFAdd %v16float %_esimd_0 %193 ; 0x000025e0
%201 = OpFAdd %v16float %200 %200 ; 0x000025f4
OpReturnValue %201 ; 0x00002608
OpFunctionEnd ; 0x00002610
%_Z28__regcall3__SIMD_CALLEE_VOIDPfNSt12experimental4simdIfNS0_10__simd_abiILNS0_12_StorageKindE2ELi16EEEEEi = OpFunction %void Inline %30 ; 0x00002614
%203 = OpFunctionParameter %_ptr_Generic_float ; 0x00002628
%204 = OpFunctionParameter %v16float ; 0x00002634
%205 = OpFunctionParameter %uint ; 0x00002640
%206 = OpLabel ; 0x0000264c
%207 = OpSConvert %ulong %205 ; 0x00002654
%208 = OpInBoundsPtrAccessChain %_ptr_Generic_float %203 %207 ; 0x00002664
%209 = OpConvertPtrToU %ulong %208 ; 0x00002678
%_esimd_1 = OpFunctionCall %v16float %llvm_genx_svm_block_ld_unaligned_v16f32_i64 %209 ; 0x00002688
OpReturn ; 0x0000269c
OpFunctionEnd ; 0x000026a0
SPIRV dialect we are working on:
// -----// IR Dump After CSE (cse) //----- //
module attributes {spirv.target_env = #spirv.target_env<#spirv.vce<v1.4, [Addresses, Float16Buffer, Int64, Int16, Int8, Kernel, Linkage, Vector16, GenericPointer, Groups, Float16, Float64, AtomicFloat32AddEXT, ExpectAssumeKHR], [SPV_EXT_shader_atomic_float_add, SPV_KHR_expect_assume]>, api=OpenCL, #spirv.resource_limits<>>, "triton_gpu.num-warps" = 1 : i32, triton_gpu.shared = 2048 : i32, "triton_gpu.threads-per-warp" = 32 : i32} {
spirv.GlobalVariable @__builtin_var_LocalInvocationId__ built_in("LocalInvocationId") : !spirv.ptr<vector<3xi64>, Input>
spirv.GlobalVariable @__builtin_var_WorkgroupId__ built_in("WorkgroupId") : !spirv.ptr<vector<3xi64>, Input>
spirv.func @SIMDwrapper(%arg0: vector<16xi32>) -> vector<16xi32> "None" attributes {referenced_indirectly_i_n_t_e_l, stack_call_i_n_t_e_l, vector_compute_function_i_n_t_e_l} {
%0 = spirv.Undef : vector<16xi32>
%cst0_i32 = spirv.Constant 0 : i32
%1 = spirv.VectorInsertDynamic %cst0_i32, %0[%cst0_i32] : vector<16xi32>, i32
%cst1_i32 = spirv.Constant 1 : i32
%2 = spirv.VectorInsertDynamic %cst0_i32, %1[%cst1_i32] : vector<16xi32>, i32
%cst2_i32 = spirv.Constant 2 : i32
%3 = spirv.VectorInsertDynamic %cst0_i32, %2[%cst2_i32] : vector<16xi32>, i32
%cst3_i32 = spirv.Constant 3 : i32
%4 = spirv.VectorInsertDynamic %cst0_i32, %3[%cst3_i32] : vector<16xi32>, i32
%cst4_i32 = spirv.Constant 4 : i32
%5 = spirv.VectorInsertDynamic %cst0_i32, %4[%cst4_i32] : vector<16xi32>, i32
%cst5_i32 = spirv.Constant 5 : i32
%6 = spirv.VectorInsertDynamic %cst0_i32, %5[%cst5_i32] : vector<16xi32>, i32
%cst6_i32 = spirv.Constant 6 : i32
%7 = spirv.VectorInsertDynamic %cst0_i32, %6[%cst6_i32] : vector<16xi32>, i32
%cst7_i32 = spirv.Constant 7 : i32
%8 = spirv.VectorInsertDynamic %cst0_i32, %7[%cst7_i32] : vector<16xi32>, i32
%cst8_i32 = spirv.Constant 8 : i32
%9 = spirv.VectorInsertDynamic %cst0_i32, %8[%cst8_i32] : vector<16xi32>, i32
%cst9_i32 = spirv.Constant 9 : i32
%10 = spirv.VectorInsertDynamic %cst0_i32, %9[%cst9_i32] : vector<16xi32>, i32
%cst10_i32 = spirv.Constant 10 : i32
%11 = spirv.VectorInsertDynamic %cst0_i32, %10[%cst10_i32] : vector<16xi32>, i32
%cst11_i32 = spirv.Constant 11 : i32
%12 = spirv.VectorInsertDynamic %cst0_i32, %11[%cst11_i32] : vector<16xi32>, i32
%cst12_i32 = spirv.Constant 12 : i32
%13 = spirv.VectorInsertDynamic %cst0_i32, %12[%cst12_i32] : vector<16xi32>, i32
%cst13_i32 = spirv.Constant 13 : i32
%14 = spirv.VectorInsertDynamic %cst0_i32, %13[%cst13_i32] : vector<16xi32>, i32
%cst14_i32 = spirv.Constant 14 : i32
%15 = spirv.VectorInsertDynamic %cst0_i32, %14[%cst14_i32] : vector<16xi32>, i32
%cst15_i32 = spirv.Constant 15 : i32
%16 = spirv.VectorInsertDynamic %cst0_i32, %15[%cst15_i32] : vector<16xi32>, i32
spirv.ReturnValue %16 : vector<16xi32>
}
spirv.func @_Z33__regcall3____builtin_invoke_simdSIMDwrapper(!spirv.ptr<(vector<16xi32>) -> vector<16xi32>, CodeSectionINTEL>, i32) -> i32 "Inline" attributes {libname = "libdevice", libpath = "", linkage_attributes = ["_Z33__regcall3____builtin_invoke_simdSIMDwrapper", "Import"]}
spirv.func @_kernel_0d1d2d3d4d5d6d7c8d9c10d11c(%arg0: !spirv.ptr<f16, CrossWorkgroup> {tt.divisibility = 16 : i32}, %arg1: !spirv.ptr<f16, CrossWorkgroup> {tt.divisibility = 16 : i32}, %arg2: !spirv.ptr<f16, CrossWorkgroup> {tt.divisibility = 16 : i32}, %arg3: i32 {tt.divisibility = 16 : i32}, %arg4: i32 {tt.divisibility = 16 : i32}, %arg5: i32 {tt.divisibility = 16 : i32}, %arg6: i32 {tt.divisibility = 16 : i32}, %arg7: i32 {tt.divisibility = 16 : i32}, %arg8: i32 {tt.divisibility = 16 : i32}, %arg9: !spirv.ptr<i8, Workgroup>) "None" attributes {noinline = false, spirv.entry_point_abi = #spirv.entry_point_abi<>, sym_visibility = "public"} {
%__builtin_var_LocalInvocationId___addr = spirv.mlir.addressof @__builtin_var_LocalInvocationId__ : !spirv.ptr<vector<3xi64>, Input>
%0 = spirv.Load "Input" %__builtin_var_LocalInvocationId___addr : vector<3xi64>
%1 = spirv.CompositeExtract %0[0 : i32] : vector<3xi64>
%2 = spirv.SConvert %1 : i64 to i32
%cst32_i32 = spirv.Constant 32 : i32
%3 = spirv.UMod %2, %cst32_i32 : i32
%4 = spirv.UDiv %2, %cst32_i32 : i32
%cst1_i32 = spirv.Constant 1 : i32
%5 = spirv.UMod %4, %cst1_i32 : i32
%6 = spirv.UDiv %4, %cst1_i32 : i32
%7 = spirv.UMod %6, %cst1_i32 : i32
%cst2_i32 = spirv.Constant 2 : i32
%8 = spirv.UMod %3, %cst2_i32 : i32
%9 = spirv.UDiv %3, %cst2_i32 : i32
%cst16_i32 = spirv.Constant 16 : i32
%10 = spirv.UMod %9, %cst16_i32 : i32
%11 = spirv.UMod %7, %cst1_i32 : i32
%12 = spirv.UMod %10, %cst1_i32 : i32
%13 = spirv.IMul %11, %cst16_i32 : i32
%14 = spirv.IAdd %12, %13 : i32
%15 = spirv.UMod %5, %cst1_i32 : i32
%16 = spirv.UMod %8, %cst2_i32 : i32
%cst8_i32 = spirv.Constant 8 : i32
%17 = spirv.IMul %15, %cst2_i32 : i32
%18 = spirv.IAdd %16, %17 : i32
%19 = spirv.IMul %cst8_i32, %18 : i32
%20 = spirv.IAdd %19, %cst1_i32 : i32
%21 = spirv.IAdd %19, %cst2_i32 : i32
%cst3_i32 = spirv.Constant 3 : i32
%22 = spirv.IAdd %19, %cst3_i32 : i32
%cst4_i32 = spirv.Constant 4 : i32
%23 = spirv.IAdd %19, %cst4_i32 : i32
%cst5_i32 = spirv.Constant 5 : i32
%24 = spirv.IAdd %19, %cst5_i32 : i32
%cst6_i32 = spirv.Constant 6 : i32
%25 = spirv.IAdd %19, %cst6_i32 : i32
%cst7_i32 = spirv.Constant 7 : i32
%26 = spirv.IAdd %19, %cst7_i32 : i32
%27 = spirv.Load "Input" %__builtin_var_LocalInvocationId___addr : vector<3xi64>
%28 = spirv.CompositeExtract %27[0 : i32] : vector<3xi64>
%29 = spirv.SConvert %28 : i64 to i32
%30 = spirv.UMod %29, %cst32_i32 : i32
%31 = spirv.UDiv %29, %cst32_i32 : i32
%32 = spirv.UMod %31, %cst1_i32 : i32
%33 = spirv.UDiv %31, %cst1_i32 : i32
%34 = spirv.UMod %33, %cst1_i32 : i32
%35 = spirv.UMod %30, %cst2_i32 : i32
%36 = spirv.UDiv %30, %cst2_i32 : i32
%37 = spirv.UMod %36, %cst16_i32 : i32
%38 = spirv.UMod %34, %cst1_i32 : i32
%39 = spirv.UMod %37, %cst16_i32 : i32
%40 = spirv.IMul %38, %cst16_i32 : i32
%41 = spirv.IAdd %39, %40 : i32
%42 = spirv.IMul %cst1_i32, %41 : i32
%43 = spirv.UMod %32, %cst1_i32 : i32
%44 = spirv.UMod %35, %cst1_i32 : i32
%45 = spirv.IMul %43, %cst2_i32 : i32
%46 = spirv.IAdd %44, %45 : i32
%47 = spirv.Load "Input" %__builtin_var_LocalInvocationId___addr : vector<3xi64>
%48 = spirv.CompositeExtract %47[0 : i32] : vector<3xi64>
%49 = spirv.SConvert %48 : i64 to i32
%50 = spirv.UMod %49, %cst32_i32 : i32
%51 = spirv.UDiv %49, %cst32_i32 : i32
%52 = spirv.UMod %51, %cst1_i32 : i32
%53 = spirv.UDiv %51, %cst1_i32 : i32
%54 = spirv.UMod %53, %cst1_i32 : i32
%55 = spirv.UMod %50, %cst2_i32 : i32
%56 = spirv.UDiv %50, %cst2_i32 : i32
%57 = spirv.UMod %56, %cst16_i32 : i32
%58 = spirv.UMod %54, %cst1_i32 : i32
%59 = spirv.UMod %57, %cst16_i32 : i32
%60 = spirv.IMul %58, %cst16_i32 : i32
%61 = spirv.IAdd %59, %60 : i32
%62 = spirv.IMul %cst1_i32, %61 : i32
%63 = spirv.UMod %52, %cst1_i32 : i32
%64 = spirv.UMod %55, %cst2_i32 : i32
%65 = spirv.IMul %63, %cst2_i32 : i32
%66 = spirv.IAdd %64, %65 : i32
%67 = spirv.IMul %cst8_i32, %66 : i32
%68 = spirv.Load "Input" %__builtin_var_LocalInvocationId___addr : vector<3xi64>
%69 = spirv.CompositeExtract %68[0 : i32] : vector<3xi64>
%70 = spirv.SConvert %69 : i64 to i32
%71 = spirv.UMod %70, %cst32_i32 : i32
%72 = spirv.UDiv %70, %cst32_i32 : i32
%73 = spirv.UMod %72, %cst1_i32 : i32
%74 = spirv.UDiv %72, %cst1_i32 : i32
%75 = spirv.UMod %74, %cst1_i32 : i32
%76 = spirv.UMod %71, %cst2_i32 : i32
%77 = spirv.UDiv %71, %cst2_i32 : i32
%78 = spirv.UMod %77, %cst16_i32 : i32
%79 = spirv.UMod %75, %cst1_i32 : i32
%80 = spirv.UMod %78, %cst16_i32 : i32
%81 = spirv.IMul %79, %cst16_i32 : i32
%82 = spirv.IAdd %80, %81 : i32
%83 = spirv.IMul %cst1_i32, %82 : i32
%84 = spirv.UMod %73, %cst1_i32 : i32
%85 = spirv.UMod %76, %cst2_i32 : i32
%86 = spirv.IMul %84, %cst2_i32 : i32
%87 = spirv.IAdd %85, %86 : i32
%88 = spirv.IMul %cst8_i32, %87 : i32
%cst_f32 = spirv.Constant 0.000000e+00 : f32
%89 = spirv.Undef : !spirv.struct<(f32, f32, f32, f32, f32, f32, f32, f32)>
%90 = spirv.CompositeInsert %cst_f32, %89[0 : i32] : f32 into !spirv.struct<(f32, f32, f32, f32, f32, f32, f32, f32)>
%91 = spirv.CompositeInsert %cst_f32, %90[1 : i32] : f32 into !spirv.struct<(f32, f32, f32, f32, f32, f32, f32, f32)>
%92 = spirv.CompositeInsert %cst_f32, %91[2 : i32] : f32 into !spirv.struct<(f32, f32, f32, f32, f32, f32, f32, f32)>
%93 = spirv.CompositeInsert %cst_f32, %92[3 : i32] : f32 into !spirv.struct<(f32, f32, f32, f32, f32, f32, f32, f32)>
%94 = spirv.CompositeInsert %cst_f32, %93[4 : i32] : f32 into !spirv.struct<(f32, f32, f32, f32, f32, f32, f32, f32)>
%95 = spirv.CompositeInsert %cst_f32, %94[5 : i32] : f32 into !spirv.struct<(f32, f32, f32, f32, f32, f32, f32, f32)>
%96 = spirv.CompositeInsert %cst_f32, %95[6 : i32] : f32 into !spirv.struct<(f32, f32, f32, f32, f32, f32, f32, f32)>
%97 = spirv.CompositeInsert %cst_f32, %96[7 : i32] : f32 into !spirv.struct<(f32, f32, f32, f32, f32, f32, f32, f32)>
%98 = spirv.Undef : !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%99 = spirv.CompositeInsert %cst16_i32, %98[0 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%100 = spirv.CompositeInsert %cst16_i32, %99[1 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%101 = spirv.CompositeInsert %cst16_i32, %100[2 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%102 = spirv.CompositeInsert %cst16_i32, %101[3 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%103 = spirv.CompositeInsert %cst16_i32, %102[4 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%104 = spirv.CompositeInsert %cst16_i32, %103[5 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%105 = spirv.CompositeInsert %cst16_i32, %104[6 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%cst15_i32 = spirv.Constant 15 : i32
%cst0_i32 = spirv.Constant 0 : i32
%__builtin_var_WorkgroupId___addr = spirv.mlir.addressof @__builtin_var_WorkgroupId__ : !spirv.ptr<vector<3xi64>, Input>
%106 = spirv.Load "Input" %__builtin_var_WorkgroupId___addr : vector<3xi64>
%107 = spirv.CompositeExtract %106[0 : i32] : vector<3xi64>
%108 = spirv.SConvert %107 : i64 to i32
%109 = spirv.Load "Input" %__builtin_var_WorkgroupId___addr : vector<3xi64>
%110 = spirv.CompositeExtract %109[1 : i32] : vector<3xi64>
%111 = spirv.SConvert %110 : i64 to i32
%112 = spirv.IAdd %arg3, %cst15_i32 : i32
%113 = spirv.SDiv %112, %cst16_i32 : i32
%114 = spirv.IAdd %arg4, %cst15_i32 : i32
%115 = spirv.SDiv %114, %cst16_i32 : i32
%116 = spirv.IMul %115, %cst8_i32 : i32
%117 = spirv.SDiv %108, %116 : i32
%118 = spirv.IMul %117, %cst8_i32 : i32
%119 = spirv.ISub %113, %118 : i32
%120 = spirv.SLessThan %119, %cst8_i32 : i32
%121 = spirv.Select %120, %119, %cst8_i32 : i1, i32
%122 = spirv.SRem %108, %121 : i32
%123 = spirv.IAdd %118, %122 : i32
%124 = spirv.SRem %108, %116 : i32
%125 = spirv.SDiv %124, %121 : i32
%126 = spirv.IMul %123, %cst16_i32 : i32
%127 = spirv.CompositeInsert %19, %98[0 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%128 = spirv.CompositeInsert %20, %127[1 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%129 = spirv.CompositeInsert %21, %128[2 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%130 = spirv.CompositeInsert %22, %129[3 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%131 = spirv.CompositeInsert %23, %130[4 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%132 = spirv.CompositeInsert %24, %131[5 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%133 = spirv.CompositeInsert %25, %132[6 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%134 = spirv.Undef : !spirv.struct<(i32)>
%135 = spirv.IAdd %126, %42 : i32
%136 = spirv.IMul %125, %cst16_i32 : i32
%137 = spirv.CompositeInsert %136, %98[0 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%138 = spirv.CompositeInsert %136, %137[1 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%139 = spirv.CompositeInsert %136, %138[2 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%140 = spirv.CompositeInsert %136, %139[3 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%141 = spirv.CompositeInsert %136, %140[4 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%142 = spirv.CompositeInsert %136, %141[5 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%143 = spirv.CompositeInsert %136, %142[6 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%144 = spirv.IAdd %136, %19 : i32
%145 = spirv.IAdd %136, %20 : i32
%146 = spirv.IAdd %136, %21 : i32
%147 = spirv.IAdd %136, %22 : i32
%148 = spirv.IAdd %136, %23 : i32
%149 = spirv.IAdd %136, %24 : i32
%150 = spirv.IAdd %136, %25 : i32
%151 = spirv.IAdd %136, %26 : i32
%152 = spirv.CompositeInsert %144, %98[0 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%153 = spirv.CompositeInsert %145, %152[1 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%154 = spirv.CompositeInsert %146, %153[2 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%155 = spirv.CompositeInsert %147, %154[3 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%156 = spirv.CompositeInsert %148, %155[4 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%157 = spirv.CompositeInsert %149, %156[5 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%158 = spirv.CompositeInsert %150, %157[6 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%159 = spirv.SRem %135, %arg3 {tt.contiguity = dense<16> : tensor<1xi32>, tt.divisibility = dense<16> : tensor<1xi32>} : i32
%160 = spirv.CompositeInsert %arg4, %98[0 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%161 = spirv.CompositeInsert %arg4, %160[1 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%162 = spirv.CompositeInsert %arg4, %161[2 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%163 = spirv.CompositeInsert %arg4, %162[3 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%164 = spirv.CompositeInsert %arg4, %163[4 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%165 = spirv.CompositeInsert %arg4, %164[5 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%166 = spirv.CompositeInsert %arg4, %165[6 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%167 = spirv.SRem %144, %arg4 {tt.contiguity = dense<16> : tensor<1xi32>, tt.divisibility = dense<16> : tensor<1xi32>} : i32
%168 = spirv.SRem %145, %arg4 {tt.contiguity = dense<16> : tensor<1xi32>, tt.divisibility = dense<16> : tensor<1xi32>} : i32
%169 = spirv.SRem %146, %arg4 {tt.contiguity = dense<16> : tensor<1xi32>, tt.divisibility = dense<16> : tensor<1xi32>} : i32
%170 = spirv.SRem %147, %arg4 {tt.contiguity = dense<16> : tensor<1xi32>, tt.divisibility = dense<16> : tensor<1xi32>} : i32
%171 = spirv.SRem %148, %arg4 {tt.contiguity = dense<16> : tensor<1xi32>, tt.divisibility = dense<16> : tensor<1xi32>} : i32
%172 = spirv.SRem %149, %arg4 {tt.contiguity = dense<16> : tensor<1xi32>, tt.divisibility = dense<16> : tensor<1xi32>} : i32
%173 = spirv.SRem %150, %arg4 {tt.contiguity = dense<16> : tensor<1xi32>, tt.divisibility = dense<16> : tensor<1xi32>} : i32
%174 = spirv.SRem %151, %arg4 {tt.contiguity = dense<16> : tensor<1xi32>, tt.divisibility = dense<16> : tensor<1xi32>} : i32
%175 = spirv.CompositeInsert %167, %98[0 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%176 = spirv.CompositeInsert %168, %175[1 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%177 = spirv.CompositeInsert %169, %176[2 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%178 = spirv.CompositeInsert %170, %177[3 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%179 = spirv.CompositeInsert %171, %178[4 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%180 = spirv.CompositeInsert %172, %179[5 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%181 = spirv.CompositeInsert %173, %180[6 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%182 = spirv.IMul %111, %cst16_i32 : i32
%183 = spirv.CompositeInsert %182, %98[0 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%184 = spirv.CompositeInsert %182, %183[1 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%185 = spirv.CompositeInsert %182, %184[2 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%186 = spirv.CompositeInsert %182, %185[3 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%187 = spirv.CompositeInsert %182, %186[4 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%188 = spirv.CompositeInsert %182, %187[5 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%189 = spirv.CompositeInsert %182, %188[6 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%190 = spirv.IAdd %182, %19 : i32
%191 = spirv.IAdd %182, %20 : i32
%192 = spirv.IAdd %182, %21 : i32
%193 = spirv.IAdd %182, %22 : i32
%194 = spirv.IAdd %182, %23 : i32
%195 = spirv.IAdd %182, %24 : i32
%196 = spirv.IAdd %182, %25 : i32
%197 = spirv.IAdd %182, %26 : i32
%198 = spirv.CompositeInsert %190, %98[0 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%199 = spirv.CompositeInsert %191, %198[1 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%200 = spirv.CompositeInsert %192, %199[2 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%201 = spirv.CompositeInsert %193, %200[3 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%202 = spirv.CompositeInsert %194, %201[4 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%203 = spirv.CompositeInsert %195, %202[5 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%204 = spirv.CompositeInsert %196, %203[6 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%205 = spirv.IAdd %182, %42 : i32
%206 = spirv.CompositeInsert %159, %98[0 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%207 = spirv.CompositeInsert %159, %206[1 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%208 = spirv.CompositeInsert %159, %207[2 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%209 = spirv.CompositeInsert %159, %208[3 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%210 = spirv.CompositeInsert %159, %209[4 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%211 = spirv.CompositeInsert %159, %210[5 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%212 = spirv.CompositeInsert %159, %211[6 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%213 = spirv.CompositeInsert %arg6, %98[0 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%214 = spirv.CompositeInsert %arg6, %213[1 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%215 = spirv.CompositeInsert %arg6, %214[2 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%216 = spirv.CompositeInsert %arg6, %215[3 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%217 = spirv.CompositeInsert %arg6, %216[4 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%218 = spirv.CompositeInsert %arg6, %217[5 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%219 = spirv.CompositeInsert %arg6, %218[6 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%220 = spirv.IMul %159, %arg6 : i32
%221 = spirv.CompositeInsert %220, %98[0 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%222 = spirv.CompositeInsert %220, %221[1 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%223 = spirv.CompositeInsert %220, %222[2 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%224 = spirv.CompositeInsert %220, %223[3 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%225 = spirv.CompositeInsert %220, %224[4 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%226 = spirv.CompositeInsert %220, %225[5 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%227 = spirv.CompositeInsert %220, %226[6 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%228 = spirv.IAdd %220, %190 : i32
%229 = spirv.IAdd %220, %191 : i32
%230 = spirv.IAdd %220, %192 : i32
%231 = spirv.IAdd %220, %193 : i32
%232 = spirv.IAdd %220, %194 : i32
%233 = spirv.IAdd %220, %195 : i32
%234 = spirv.IAdd %220, %196 : i32
%235 = spirv.IAdd %220, %197 : i32
%236 = spirv.CompositeInsert %228, %98[0 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%237 = spirv.CompositeInsert %229, %236[1 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%238 = spirv.CompositeInsert %230, %237[2 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%239 = spirv.CompositeInsert %231, %238[3 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%240 = spirv.CompositeInsert %232, %239[4 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%241 = spirv.CompositeInsert %233, %240[5 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%242 = spirv.CompositeInsert %234, %241[6 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%243 = spirv.Undef : !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%244 = spirv.CompositeInsert %arg0, %243[0 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%245 = spirv.CompositeInsert %arg0, %244[1 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%246 = spirv.CompositeInsert %arg0, %245[2 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%247 = spirv.CompositeInsert %arg0, %246[3 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%248 = spirv.CompositeInsert %arg0, %247[4 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%249 = spirv.CompositeInsert %arg0, %248[5 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%250 = spirv.CompositeInsert %arg0, %249[6 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%251 = spirv.PtrAccessChain %arg0[%228] : !spirv.ptr<f16, CrossWorkgroup>, i32
%252 = spirv.PtrAccessChain %arg0[%229] : !spirv.ptr<f16, CrossWorkgroup>, i32
%253 = spirv.PtrAccessChain %arg0[%230] : !spirv.ptr<f16, CrossWorkgroup>, i32
%254 = spirv.PtrAccessChain %arg0[%231] : !spirv.ptr<f16, CrossWorkgroup>, i32
%255 = spirv.PtrAccessChain %arg0[%232] : !spirv.ptr<f16, CrossWorkgroup>, i32
%256 = spirv.PtrAccessChain %arg0[%233] : !spirv.ptr<f16, CrossWorkgroup>, i32
%257 = spirv.PtrAccessChain %arg0[%234] : !spirv.ptr<f16, CrossWorkgroup>, i32
%258 = spirv.PtrAccessChain %arg0[%235] : !spirv.ptr<f16, CrossWorkgroup>, i32
%259 = spirv.CompositeInsert %251, %243[0 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%260 = spirv.CompositeInsert %252, %259[1 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%261 = spirv.CompositeInsert %253, %260[2 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%262 = spirv.CompositeInsert %254, %261[3 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%263 = spirv.CompositeInsert %255, %262[4 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%264 = spirv.CompositeInsert %256, %263[5 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%265 = spirv.CompositeInsert %257, %264[6 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%266 = spirv.CompositeInsert %258, %265[7 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%267 = spirv.CompositeInsert %205, %98[0 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%268 = spirv.CompositeInsert %205, %267[1 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%269 = spirv.CompositeInsert %205, %268[2 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%270 = spirv.CompositeInsert %205, %269[3 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%271 = spirv.CompositeInsert %205, %270[4 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%272 = spirv.CompositeInsert %205, %271[5 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%273 = spirv.CompositeInsert %205, %272[6 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%274 = spirv.CompositeInsert %arg7, %98[0 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%275 = spirv.CompositeInsert %arg7, %274[1 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%276 = spirv.CompositeInsert %arg7, %275[2 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%277 = spirv.CompositeInsert %arg7, %276[3 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%278 = spirv.CompositeInsert %arg7, %277[4 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%279 = spirv.CompositeInsert %arg7, %278[5 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%280 = spirv.CompositeInsert %arg7, %279[6 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%281 = spirv.IMul %205, %arg7 : i32
%282 = spirv.CompositeInsert %281, %98[0 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%283 = spirv.CompositeInsert %281, %282[1 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%284 = spirv.CompositeInsert %281, %283[2 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%285 = spirv.CompositeInsert %281, %284[3 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%286 = spirv.CompositeInsert %281, %285[4 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%287 = spirv.CompositeInsert %281, %286[5 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%288 = spirv.CompositeInsert %281, %287[6 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%289 = spirv.IAdd %281, %167 : i32
%290 = spirv.IAdd %281, %168 : i32
%291 = spirv.IAdd %281, %169 : i32
%292 = spirv.IAdd %281, %170 : i32
%293 = spirv.IAdd %281, %171 : i32
%294 = spirv.IAdd %281, %172 : i32
%295 = spirv.IAdd %281, %173 : i32
%296 = spirv.IAdd %281, %174 : i32
%297 = spirv.CompositeInsert %289, %98[0 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%298 = spirv.CompositeInsert %290, %297[1 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%299 = spirv.CompositeInsert %291, %298[2 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%300 = spirv.CompositeInsert %292, %299[3 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%301 = spirv.CompositeInsert %293, %300[4 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%302 = spirv.CompositeInsert %294, %301[5 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%303 = spirv.CompositeInsert %295, %302[6 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%304 = spirv.CompositeInsert %arg1, %243[0 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%305 = spirv.CompositeInsert %arg1, %304[1 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%306 = spirv.CompositeInsert %arg1, %305[2 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%307 = spirv.CompositeInsert %arg1, %306[3 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%308 = spirv.CompositeInsert %arg1, %307[4 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%309 = spirv.CompositeInsert %arg1, %308[5 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%310 = spirv.CompositeInsert %arg1, %309[6 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%311 = spirv.PtrAccessChain %arg1[%289] : !spirv.ptr<f16, CrossWorkgroup>, i32
%312 = spirv.PtrAccessChain %arg1[%290] : !spirv.ptr<f16, CrossWorkgroup>, i32
%313 = spirv.PtrAccessChain %arg1[%291] : !spirv.ptr<f16, CrossWorkgroup>, i32
%314 = spirv.PtrAccessChain %arg1[%292] : !spirv.ptr<f16, CrossWorkgroup>, i32
%315 = spirv.PtrAccessChain %arg1[%293] : !spirv.ptr<f16, CrossWorkgroup>, i32
%316 = spirv.PtrAccessChain %arg1[%294] : !spirv.ptr<f16, CrossWorkgroup>, i32
%317 = spirv.PtrAccessChain %arg1[%295] : !spirv.ptr<f16, CrossWorkgroup>, i32
%318 = spirv.PtrAccessChain %arg1[%296] : !spirv.ptr<f16, CrossWorkgroup>, i32
%319 = spirv.CompositeInsert %311, %243[0 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%320 = spirv.CompositeInsert %312, %319[1 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%321 = spirv.CompositeInsert %313, %320[2 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%322 = spirv.CompositeInsert %314, %321[3 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%323 = spirv.CompositeInsert %315, %322[4 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%324 = spirv.CompositeInsert %316, %323[5 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%325 = spirv.CompositeInsert %317, %324[6 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%326 = spirv.CompositeInsert %318, %325[7 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%327 = spirv.IAdd %arg5, %cst15_i32 : i32
%328 = spirv.SDiv %327, %cst16_i32 : i32
%329 = spirv.IMul %arg7, %cst16_i32 : i32
%330 = spirv.CompositeInsert %329, %98[0 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%331 = spirv.CompositeInsert %329, %330[1 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%332 = spirv.CompositeInsert %329, %331[2 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%333 = spirv.CompositeInsert %329, %332[3 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%334 = spirv.CompositeInsert %329, %333[4 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%335 = spirv.CompositeInsert %329, %334[5 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%336 = spirv.CompositeInsert %329, %335[6 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%337 = spirv.SGreaterThan %328, %cst0_i32 : i32
%338 = spirv.PtrAccessChain %arg9[%cst0_i32] : !spirv.ptr<i8, Workgroup>, i32
%339 = spirv.Bitcast %338 : !spirv.ptr<i8, Workgroup> to !spirv.ptr<f16, Workgroup>
%cst256_i32 = spirv.Constant 256 : i32
%340 = spirv.Undef : !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32, i32, i32)>
%341 = spirv.CompositeInsert %339, %340[0 : i32] : !spirv.ptr<f16, Workgroup> into !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32, i32, i32)>
%342 = spirv.CompositeInsert %cst256_i32, %341[1 : i32] : i32 into !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32, i32, i32)>
%343 = spirv.CompositeInsert %cst16_i32, %342[2 : i32] : i32 into !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32, i32, i32)>
%344 = spirv.CompositeInsert %cst1_i32, %343[3 : i32] : i32 into !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32, i32, i32)>
%345 = spirv.CompositeInsert %cst0_i32, %344[4 : i32] : i32 into !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32, i32, i32)>
%346 = spirv.CompositeInsert %cst0_i32, %345[5 : i32] : i32 into !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32, i32, i32)>
%347 = spirv.CompositeInsert %cst0_i32, %346[6 : i32] : i32 into !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32, i32, i32)>
%348 = spirv.Undef : !spirv.struct<(i1, i1, i1, i1, i1, i1, i1, i1)>
%349 = spirv.CompositeInsert %337, %348[0 : i32] : i1 into !spirv.struct<(i1, i1, i1, i1, i1, i1, i1, i1)>
%350 = spirv.CompositeInsert %337, %349[1 : i32] : i1 into !spirv.struct<(i1, i1, i1, i1, i1, i1, i1, i1)>
%351 = spirv.CompositeInsert %337, %350[2 : i32] : i1 into !spirv.struct<(i1, i1, i1, i1, i1, i1, i1, i1)>
%352 = spirv.CompositeInsert %337, %351[3 : i32] : i1 into !spirv.struct<(i1, i1, i1, i1, i1, i1, i1, i1)>
%353 = spirv.CompositeInsert %337, %352[4 : i32] : i1 into !spirv.struct<(i1, i1, i1, i1, i1, i1, i1, i1)>
%354 = spirv.CompositeInsert %337, %353[5 : i32] : i1 into !spirv.struct<(i1, i1, i1, i1, i1, i1, i1, i1)>
%355 = spirv.CompositeInsert %337, %354[6 : i32] : i1 into !spirv.struct<(i1, i1, i1, i1, i1, i1, i1, i1)>
%356 = spirv.IMul %cst0_i32, %cst256_i32 : i32
%357 = spirv.IAdd %cst0_i32, %356 : i32
%358 = spirv.IMul %cst0_i32, %cst16_i32 : i32
%359 = spirv.IAdd %357, %358 : i32
%360 = spirv.IMul %cst0_i32, %cst1_i32 : i32
%361 = spirv.IAdd %359, %360 : i32
%362 = spirv.PtrAccessChain %339[%361] : !spirv.ptr<f16, Workgroup>, i32
%363 = spirv.UDiv %62, %cst4_i32 : i32
%364 = spirv.UMod %363, %cst2_i32 : i32
%365 = spirv.IMul %62, %cst16_i32 : i32
%366 = spirv.UDiv %67, %cst8_i32 : i32
%367 = spirv.BitwiseXor %366, %364 : i32
%368 = spirv.IMul %367, %cst8_i32 : i32
%369 = spirv.UMod %67, %cst8_i32 : i32
%370 = spirv.UDiv %369, %cst8_i32 : i32
%371 = spirv.IMul %370, %cst8_i32 : i32
%372 = spirv.IAdd %368, %371 : i32
%373 = spirv.IMul %372, %cst1_i32 : i32
%374 = spirv.IAdd %365, %373 : i32
%375 = spirv.PtrAccessChain %362[%374] : !spirv.ptr<f16, Workgroup>, i32
%376 = spirv.PtrAccessChain %375[%358] : !spirv.ptr<f16, Workgroup>, i32
%377 = spirv.Bitcast %376 : !spirv.ptr<f16, Workgroup> to !spirv.ptr<vector<4xi32>, Workgroup>
%378 = spirv.PtrAccessChain %377[%cst0_i32] : !spirv.ptr<vector<4xi32>, Workgroup>, i32
%379 = spirv.Bitcast %251 : !spirv.ptr<f16, CrossWorkgroup> to !spirv.ptr<vector<4xi32>, CrossWorkgroup>
%380 = spirv.Undef : vector<4xi32>
%381 = spirv.CompositeInsert %cst0_i32, %380[0 : i32] : i32 into vector<4xi32>
%382 = spirv.CompositeInsert %cst0_i32, %381[1 : i32] : i32 into vector<4xi32>
%383 = spirv.CompositeInsert %cst0_i32, %382[2 : i32] : i32 into vector<4xi32>
%384 = spirv.CompositeInsert %cst0_i32, %383[3 : i32] : i32 into vector<4xi32>
spirv.BranchConditional %337, ^bb1, ^bb2(%384 : vector<4xi32>)
^bb1: // pred: ^bb0
%385 = spirv.Load "CrossWorkgroup" %379 : vector<4xi32>
spirv.Branch ^bb2(%385 : vector<4xi32>)
^bb2(%386: vector<4xi32>): // 2 preds: ^bb0, ^bb1
spirv.Store "Workgroup" %378, %386 : vector<4xi32>
%cst1024_i32 = spirv.Constant 1024 : i32
%387 = spirv.PtrAccessChain %arg9[%cst1024_i32] : !spirv.ptr<i8, Workgroup>, i32
%388 = spirv.Bitcast %387 : !spirv.ptr<i8, Workgroup> to !spirv.ptr<f16, Workgroup>
%389 = spirv.CompositeInsert %388, %340[0 : i32] : !spirv.ptr<f16, Workgroup> into !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32, i32, i32)>
%390 = spirv.CompositeInsert %cst256_i32, %389[1 : i32] : i32 into !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32, i32, i32)>
%391 = spirv.CompositeInsert %cst16_i32, %390[2 : i32] : i32 into !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32, i32, i32)>
%392 = spirv.CompositeInsert %cst1_i32, %391[3 : i32] : i32 into !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32, i32, i32)>
%393 = spirv.CompositeInsert %cst0_i32, %392[4 : i32] : i32 into !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32, i32, i32)>
%394 = spirv.CompositeInsert %cst0_i32, %393[5 : i32] : i32 into !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32, i32, i32)>
%395 = spirv.CompositeInsert %cst0_i32, %394[6 : i32] : i32 into !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32, i32, i32)>
%396 = spirv.PtrAccessChain %388[%361] : !spirv.ptr<f16, Workgroup>, i32
%397 = spirv.PtrAccessChain %396[%374] : !spirv.ptr<f16, Workgroup>, i32
%398 = spirv.PtrAccessChain %397[%358] : !spirv.ptr<f16, Workgroup>, i32
%399 = spirv.Bitcast %398 : !spirv.ptr<f16, Workgroup> to !spirv.ptr<vector<4xi32>, Workgroup>
%400 = spirv.PtrAccessChain %399[%cst0_i32] : !spirv.ptr<vector<4xi32>, Workgroup>, i32
%401 = spirv.Bitcast %311 : !spirv.ptr<f16, CrossWorkgroup> to !spirv.ptr<vector<4xi32>, CrossWorkgroup>
spirv.BranchConditional %337, ^bb3, ^bb4(%384 : vector<4xi32>)
^bb3: // pred: ^bb2
%402 = spirv.Load "CrossWorkgroup" %401 : vector<4xi32>
spirv.Branch ^bb4(%402 : vector<4xi32>)
^bb4(%403: vector<4xi32>): // 2 preds: ^bb2, ^bb3
spirv.Store "Workgroup" %400, %403 : vector<4xi32>
spirv.ControlBarrier <Workgroup>, <Workgroup>, <AcquireRelease|WorkgroupMemory>
%404 = spirv.Undef : !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32)>
%405 = spirv.CompositeInsert %362, %404[0 : i32] : !spirv.ptr<f16, Workgroup> into !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32)>
%406 = spirv.CompositeInsert %cst16_i32, %405[1 : i32] : i32 into !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32)>
%407 = spirv.CompositeInsert %cst1_i32, %406[2 : i32] : i32 into !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32)>
%408 = spirv.CompositeInsert %cst0_i32, %407[3 : i32] : i32 into !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32)>
%409 = spirv.CompositeInsert %cst0_i32, %408[4 : i32] : i32 into !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32)>
%410 = spirv.CompositeInsert %396, %404[0 : i32] : !spirv.ptr<f16, Workgroup> into !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32)>
%411 = spirv.CompositeInsert %cst16_i32, %410[1 : i32] : i32 into !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32)>
%412 = spirv.CompositeInsert %cst1_i32, %411[2 : i32] : i32 into !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32)>
%413 = spirv.CompositeInsert %cst0_i32, %412[3 : i32] : i32 into !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32)>
%414 = spirv.CompositeInsert %cst0_i32, %413[4 : i32] : i32 into !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32)>
spirv.Branch ^bb5(%cst0_i32, %97, %266, %326, %347, %395, %409, %414, %266, %326, %cst0_i32, %cst1_i32, %cst1_i32 : i32, !spirv.struct<(f32, f32, f32, f32, f32, f32, f32, f32)>, !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>, !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>, !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32, i32, i32)>, !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32, i32, i32)>, !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32)>, !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32)>, !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>, !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>, i32, i32, i32)
^bb5(%415: i32, %416: !spirv.struct<(f32, f32, f32, f32, f32, f32, f32, f32)>, %417: !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>, %418: !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>, %419: !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32, i32, i32)>, %420: !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32, i32, i32)>, %421: !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32)>, %422: !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32)>, %423: !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>, %424: !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>, %425: i32, %426: i32, %427: i32): // 2 preds: ^bb4, ^bb10
%428 = spirv.SLessThan %415, %328 : i32
spirv.BranchConditional %428, ^bb6, ^bb11
^bb6: // pred: ^bb5
%429 = spirv.CompositeExtract %421[0 : i32] : !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32)>
%430 = spirv.CompositeExtract %421[1 : i32] : !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32)>
%431 = spirv.CompositeExtract %421[4 : i32] : !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32)>
%432 = spirv.Load "Input" %__builtin_var_LocalInvocationId___addr : vector<3xi64>
%433 = spirv.CompositeExtract %432[0 : i32] : vector<3xi64>
%434 = spirv.SConvert %433 : i64 to i32
%435 = spirv.UDiv %434, %cst32_i32 : i32
%436 = spirv.UMod %434, %cst32_i32 : i32
%437 = spirv.UMod %435, %cst1_i32 : i32
%438 = spirv.UDiv %435, %cst1_i32 : i32
%439 = spirv.UMod %438, %cst1_i32 : i32
%440 = spirv.UMod %439, %cst1_i32 : i32
%441 = spirv.UMod %436, %cst8_i32 : i32
%442 = spirv.UDiv %436, %cst8_i32 : i32
%443 = spirv.UMod %442, %cst2_i32 : i32
%444 = spirv.UDiv %442, %cst2_i32 : i32
%445 = spirv.IMul %440, %cst2_i32 : i32
%446 = spirv.IAdd %445, %443 : i32
%447 = spirv.UDiv %431, %cst8_i32 : i32
%448 = spirv.UDiv %441, %cst4_i32 : i32
%449 = spirv.UMod %448, %cst2_i32 : i32
%450 = spirv.IMul %446, %cst8_i32 : i32
%451 = spirv.IAdd %441, %450 : i32
%452 = spirv.UMod %451, %cst16_i32 : i32
%453 = spirv.IAdd %444, %447 : i32
%454 = spirv.BitwiseXor %453, %449 : i32
%455 = spirv.IMul %452, %430 : i32
%456 = spirv.IMul %454, %cst8_i32 : i32
%457 = spirv.IAdd %456, %455 : i32
%458 = spirv.IAdd %444, %cst2_i32 : i32
%459 = spirv.IAdd %458, %447 : i32
%460 = spirv.BitwiseXor %459, %449 : i32
%461 = spirv.IMul %460, %cst8_i32 : i32
%462 = spirv.IAdd %461, %455 : i32
%463 = spirv.ISub %cst0_i32, %431 : i32
%464 = spirv.PtrAccessChain %429[%463] : !spirv.ptr<f16, Workgroup>, i32
%465 = spirv.PtrAccessChain %464[%457] : !spirv.ptr<f16, Workgroup>, i32
%466 = spirv.IMul %cst0_i32, %430 : i32
%467 = spirv.PtrAccessChain %465[%466] : !spirv.ptr<f16, Workgroup>, i32
%468 = spirv.Bitcast %467 : !spirv.ptr<f16, Workgroup> to !spirv.ptr<vector<4xi32>, Workgroup>
%469 = spirv.Load "Workgroup" %468 : vector<4xi32>
%470 = spirv.CompositeExtract %469[0 : i32] : vector<4xi32>
%471 = spirv.CompositeExtract %469[1 : i32] : vector<4xi32>
%472 = spirv.CompositeExtract %469[2 : i32] : vector<4xi32>
%473 = spirv.CompositeExtract %469[3 : i32] : vector<4xi32>
%474 = spirv.Undef : !spirv.struct<(i32, i32, i32, i32)>
%475 = spirv.CompositeInsert %470, %474[0 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32)>
%476 = spirv.CompositeInsert %472, %475[1 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32)>
%477 = spirv.CompositeInsert %471, %476[2 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32)>
%478 = spirv.CompositeExtract %422[0 : i32] : !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32)>
%479 = spirv.CompositeExtract %422[1 : i32] : !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32)>
%480 = spirv.CompositeExtract %422[4 : i32] : !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32)>
%481 = spirv.Load "Input" %__builtin_var_LocalInvocationId___addr : vector<3xi64>
%482 = spirv.CompositeExtract %481[0 : i32] : vector<3xi64>
%483 = spirv.SConvert %482 : i64 to i32
%484 = spirv.UDiv %483, %cst32_i32 : i32
%485 = spirv.UMod %483, %cst32_i32 : i32
%486 = spirv.UMod %484, %cst1_i32 : i32
%487 = spirv.UDiv %484, %cst1_i32 : i32
%488 = spirv.UMod %487, %cst1_i32 : i32
%489 = spirv.UMod %486, %cst2_i32 : i32
%490 = spirv.UMod %485, %cst8_i32 : i32
%491 = spirv.UDiv %485, %cst8_i32 : i32
%492 = spirv.UMod %491, %cst2_i32 : i32
%493 = spirv.UDiv %491, %cst2_i32 : i32
%494 = spirv.IAdd %489, %493 : i32
%495 = spirv.UDiv %480, %cst8_i32 : i32
%496 = spirv.UDiv %490, %cst4_i32 : i32
%497 = spirv.UMod %496, %cst2_i32 : i32
%498 = spirv.IMul %492, %cst8_i32 : i32
%499 = spirv.IAdd %490, %498 : i32
%500 = spirv.UMod %499, %cst16_i32 : i32
%501 = spirv.IAdd %494, %495 : i32
%502 = spirv.BitwiseXor %501, %497 : i32
%503 = spirv.IMul %500, %479 : i32
%504 = spirv.IMul %502, %cst8_i32 : i32
%505 = spirv.IAdd %504, %503 : i32
%506 = spirv.IAdd %494, %cst1_i32 : i32
%507 = spirv.IAdd %506, %495 : i32
%508 = spirv.BitwiseXor %507, %497 : i32
%509 = spirv.IMul %508, %cst8_i32 : i32
%510 = spirv.IAdd %509, %503 : i32
%511 = spirv.ISub %cst0_i32, %480 : i32
%512 = spirv.PtrAccessChain %478[%511] : !spirv.ptr<f16, Workgroup>, i32
%513 = spirv.PtrAccessChain %512[%505] : !spirv.ptr<f16, Workgroup>, i32
%514 = spirv.IMul %cst0_i32, %479 : i32
%515 = spirv.PtrAccessChain %513[%514] : !spirv.ptr<f16, Workgroup>, i32
%516 = spirv.Bitcast %515 : !spirv.ptr<f16, Workgroup> to !spirv.ptr<vector<4xi32>, Workgroup>
%517 = spirv.Load "Workgroup" %516 : vector<4xi32>
%518 = spirv.CompositeExtract %517[0 : i32] : vector<4xi32>
%519 = spirv.CompositeExtract %517[1 : i32] : vector<4xi32>
%520 = spirv.CompositeExtract %517[2 : i32] : vector<4xi32>
%521 = spirv.CompositeExtract %517[3 : i32] : vector<4xi32>
%522 = spirv.CompositeInsert %518, %474[0 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32)>
%523 = spirv.CompositeInsert %519, %522[1 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32)>
%524 = spirv.CompositeInsert %520, %523[2 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32)>
%SIMDwrapper_addr = spirv.mlir.addressof @SIMDwrapper : !spirv.ptr<(vector<16xi32>) -> vector<16xi32>, CodeSectionINTEL>
%525 = spirv.FunctionCall @_Z33__regcall3____builtin_invoke_simdSIMDwrapper(%SIMDwrapper_addr, %cst0_i32) : (!spirv.ptr<(vector<16xi32>) -> vector<16xi32>, CodeSectionINTEL>, i32) -> i32
%526 = spirv.Undef : f32
%527 = spirv.FunctionCall @_Z33__regcall3____builtin_invoke_simdSIMDwrapper(%SIMDwrapper_addr, %cst0_i32) : (!spirv.ptr<(vector<16xi32>) -> vector<16xi32>, CodeSectionINTEL>, i32) -> i32
%528 = spirv.CompositeInsert %526, %89[0 : i32] : f32 into !spirv.struct<(f32, f32, f32, f32, f32, f32, f32, f32)>
%529 = spirv.CompositeInsert %526, %528[1 : i32] : f32 into !spirv.struct<(f32, f32, f32, f32, f32, f32, f32, f32)>
%530 = spirv.CompositeInsert %526, %529[2 : i32] : f32 into !spirv.struct<(f32, f32, f32, f32, f32, f32, f32, f32)>
%531 = spirv.CompositeInsert %526, %530[3 : i32] : f32 into !spirv.struct<(f32, f32, f32, f32, f32, f32, f32, f32)>
%532 = spirv.CompositeInsert %526, %531[4 : i32] : f32 into !spirv.struct<(f32, f32, f32, f32, f32, f32, f32, f32)>
%533 = spirv.CompositeInsert %526, %532[5 : i32] : f32 into !spirv.struct<(f32, f32, f32, f32, f32, f32, f32, f32)>
%534 = spirv.CompositeInsert %526, %533[6 : i32] : f32 into !spirv.struct<(f32, f32, f32, f32, f32, f32, f32, f32)>
%535 = spirv.CompositeInsert %526, %534[7 : i32] : f32 into !spirv.struct<(f32, f32, f32, f32, f32, f32, f32, f32)>
%536 = spirv.CompositeExtract %417[0 : i32] : !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%537 = spirv.CompositeExtract %417[1 : i32] : !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%538 = spirv.CompositeExtract %417[2 : i32] : !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%539 = spirv.CompositeExtract %417[3 : i32] : !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%540 = spirv.CompositeExtract %417[4 : i32] : !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%541 = spirv.CompositeExtract %417[5 : i32] : !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%542 = spirv.CompositeExtract %417[6 : i32] : !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%543 = spirv.CompositeExtract %417[7 : i32] : !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%544 = spirv.PtrAccessChain %536[%cst16_i32] : !spirv.ptr<f16, CrossWorkgroup>, i32
%545 = spirv.PtrAccessChain %537[%cst16_i32] : !spirv.ptr<f16, CrossWorkgroup>, i32
%546 = spirv.PtrAccessChain %538[%cst16_i32] : !spirv.ptr<f16, CrossWorkgroup>, i32
%547 = spirv.PtrAccessChain %539[%cst16_i32] : !spirv.ptr<f16, CrossWorkgroup>, i32
%548 = spirv.PtrAccessChain %540[%cst16_i32] : !spirv.ptr<f16, CrossWorkgroup>, i32
%549 = spirv.PtrAccessChain %541[%cst16_i32] : !spirv.ptr<f16, CrossWorkgroup>, i32
%550 = spirv.PtrAccessChain %542[%cst16_i32] : !spirv.ptr<f16, CrossWorkgroup>, i32
%551 = spirv.PtrAccessChain %543[%cst16_i32] : !spirv.ptr<f16, CrossWorkgroup>, i32
%552 = spirv.CompositeInsert %544, %243[0 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%553 = spirv.CompositeInsert %545, %552[1 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%554 = spirv.CompositeInsert %546, %553[2 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%555 = spirv.CompositeInsert %547, %554[3 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%556 = spirv.CompositeInsert %548, %555[4 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%557 = spirv.CompositeInsert %549, %556[5 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%558 = spirv.CompositeInsert %550, %557[6 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%559 = spirv.CompositeInsert %551, %558[7 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%560 = spirv.CompositeExtract %418[0 : i32] : !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%561 = spirv.CompositeExtract %418[1 : i32] : !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%562 = spirv.CompositeExtract %418[2 : i32] : !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%563 = spirv.CompositeExtract %418[3 : i32] : !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%564 = spirv.CompositeExtract %418[4 : i32] : !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%565 = spirv.CompositeExtract %418[5 : i32] : !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%566 = spirv.CompositeExtract %418[6 : i32] : !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%567 = spirv.CompositeExtract %418[7 : i32] : !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%568 = spirv.PtrAccessChain %560[%329] : !spirv.ptr<f16, CrossWorkgroup>, i32
%569 = spirv.PtrAccessChain %561[%329] : !spirv.ptr<f16, CrossWorkgroup>, i32
%570 = spirv.PtrAccessChain %562[%329] : !spirv.ptr<f16, CrossWorkgroup>, i32
%571 = spirv.PtrAccessChain %563[%329] : !spirv.ptr<f16, CrossWorkgroup>, i32
%572 = spirv.PtrAccessChain %564[%329] : !spirv.ptr<f16, CrossWorkgroup>, i32
%573 = spirv.PtrAccessChain %565[%329] : !spirv.ptr<f16, CrossWorkgroup>, i32
%574 = spirv.PtrAccessChain %566[%329] : !spirv.ptr<f16, CrossWorkgroup>, i32
%575 = spirv.PtrAccessChain %567[%329] : !spirv.ptr<f16, CrossWorkgroup>, i32
%576 = spirv.CompositeInsert %568, %243[0 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%577 = spirv.CompositeInsert %569, %576[1 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%578 = spirv.CompositeInsert %570, %577[2 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%579 = spirv.CompositeInsert %571, %578[3 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%580 = spirv.CompositeInsert %572, %579[4 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%581 = spirv.CompositeInsert %573, %580[5 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%582 = spirv.CompositeInsert %574, %581[6 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%583 = spirv.CompositeInsert %575, %582[7 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%584 = spirv.IAdd %425, %cst1_i32 : i32
%585 = spirv.SLessThan %584, %328 : i32
%586 = spirv.SRem %426, %cst2_i32 : i32
%587 = spirv.SRem %427, %cst2_i32 : i32
%588 = spirv.CompositeExtract %423[0 : i32] : !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%589 = spirv.CompositeExtract %423[1 : i32] : !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%590 = spirv.CompositeExtract %423[2 : i32] : !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%591 = spirv.CompositeExtract %423[3 : i32] : !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%592 = spirv.CompositeExtract %423[4 : i32] : !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%593 = spirv.CompositeExtract %423[5 : i32] : !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%594 = spirv.CompositeExtract %423[6 : i32] : !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%595 = spirv.CompositeExtract %423[7 : i32] : !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%596 = spirv.PtrAccessChain %588[%cst16_i32] : !spirv.ptr<f16, CrossWorkgroup>, i32
%597 = spirv.PtrAccessChain %589[%cst16_i32] : !spirv.ptr<f16, CrossWorkgroup>, i32
%598 = spirv.PtrAccessChain %590[%cst16_i32] : !spirv.ptr<f16, CrossWorkgroup>, i32
%599 = spirv.PtrAccessChain %591[%cst16_i32] : !spirv.ptr<f16, CrossWorkgroup>, i32
%600 = spirv.PtrAccessChain %592[%cst16_i32] : !spirv.ptr<f16, CrossWorkgroup>, i32
%601 = spirv.PtrAccessChain %593[%cst16_i32] : !spirv.ptr<f16, CrossWorkgroup>, i32
%602 = spirv.PtrAccessChain %594[%cst16_i32] : !spirv.ptr<f16, CrossWorkgroup>, i32
%603 = spirv.PtrAccessChain %595[%cst16_i32] : !spirv.ptr<f16, CrossWorkgroup>, i32
%604 = spirv.CompositeInsert %596, %243[0 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%605 = spirv.CompositeInsert %597, %604[1 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%606 = spirv.CompositeInsert %598, %605[2 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%607 = spirv.CompositeInsert %599, %606[3 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%608 = spirv.CompositeInsert %600, %607[4 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%609 = spirv.CompositeInsert %601, %608[5 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%610 = spirv.CompositeInsert %602, %609[6 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%611 = spirv.CompositeInsert %603, %610[7 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%612 = spirv.CompositeExtract %424[0 : i32] : !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%613 = spirv.CompositeExtract %424[1 : i32] : !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%614 = spirv.CompositeExtract %424[2 : i32] : !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%615 = spirv.CompositeExtract %424[3 : i32] : !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%616 = spirv.CompositeExtract %424[4 : i32] : !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%617 = spirv.CompositeExtract %424[5 : i32] : !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%618 = spirv.CompositeExtract %424[6 : i32] : !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%619 = spirv.CompositeExtract %424[7 : i32] : !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%620 = spirv.PtrAccessChain %612[%329] : !spirv.ptr<f16, CrossWorkgroup>, i32
%621 = spirv.PtrAccessChain %613[%329] : !spirv.ptr<f16, CrossWorkgroup>, i32
%622 = spirv.PtrAccessChain %614[%329] : !spirv.ptr<f16, CrossWorkgroup>, i32
%623 = spirv.PtrAccessChain %615[%329] : !spirv.ptr<f16, CrossWorkgroup>, i32
%624 = spirv.PtrAccessChain %616[%329] : !spirv.ptr<f16, CrossWorkgroup>, i32
%625 = spirv.PtrAccessChain %617[%329] : !spirv.ptr<f16, CrossWorkgroup>, i32
%626 = spirv.PtrAccessChain %618[%329] : !spirv.ptr<f16, CrossWorkgroup>, i32
%627 = spirv.PtrAccessChain %619[%329] : !spirv.ptr<f16, CrossWorkgroup>, i32
%628 = spirv.CompositeInsert %620, %243[0 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%629 = spirv.CompositeInsert %621, %628[1 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%630 = spirv.CompositeInsert %622, %629[2 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%631 = spirv.CompositeInsert %623, %630[3 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%632 = spirv.CompositeInsert %624, %631[4 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%633 = spirv.CompositeInsert %625, %632[5 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%634 = spirv.CompositeInsert %626, %633[6 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%635 = spirv.CompositeInsert %627, %634[7 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%636 = spirv.CompositeInsert %585, %348[0 : i32] : i1 into !spirv.struct<(i1, i1, i1, i1, i1, i1, i1, i1)>
%637 = spirv.CompositeInsert %585, %636[1 : i32] : i1 into !spirv.struct<(i1, i1, i1, i1, i1, i1, i1, i1)>
%638 = spirv.CompositeInsert %585, %637[2 : i32] : i1 into !spirv.struct<(i1, i1, i1, i1, i1, i1, i1, i1)>
%639 = spirv.CompositeInsert %585, %638[3 : i32] : i1 into !spirv.struct<(i1, i1, i1, i1, i1, i1, i1, i1)>
%640 = spirv.CompositeInsert %585, %639[4 : i32] : i1 into !spirv.struct<(i1, i1, i1, i1, i1, i1, i1, i1)>
%641 = spirv.CompositeInsert %585, %640[5 : i32] : i1 into !spirv.struct<(i1, i1, i1, i1, i1, i1, i1, i1)>
%642 = spirv.CompositeInsert %585, %641[6 : i32] : i1 into !spirv.struct<(i1, i1, i1, i1, i1, i1, i1, i1)>
spirv.ControlBarrier <Workgroup>, <Workgroup>, <AcquireRelease|WorkgroupMemory>
%643 = spirv.CompositeExtract %419[0 : i32] : !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32, i32, i32)>
%644 = spirv.CompositeExtract %419[1 : i32] : !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32, i32, i32)>
%645 = spirv.CompositeExtract %419[2 : i32] : !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32, i32, i32)>
%646 = spirv.CompositeExtract %419[3 : i32] : !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32, i32, i32)>
%647 = spirv.IMul %586, %644 : i32
%648 = spirv.IAdd %cst0_i32, %647 : i32
%649 = spirv.IMul %cst0_i32, %645 : i32
%650 = spirv.IAdd %648, %649 : i32
%651 = spirv.IMul %cst0_i32, %646 : i32
%652 = spirv.IAdd %650, %651 : i32
%653 = spirv.PtrAccessChain %643[%652] : !spirv.ptr<f16, Workgroup>, i32
%654 = spirv.IMul %62, %645 : i32
%655 = spirv.IMul %372, %646 : i32
%656 = spirv.IAdd %654, %655 : i32
%657 = spirv.PtrAccessChain %653[%656] : !spirv.ptr<f16, Workgroup>, i32
%658 = spirv.PtrAccessChain %657[%649] : !spirv.ptr<f16, Workgroup>, i32
%659 = spirv.Bitcast %658 : !spirv.ptr<f16, Workgroup> to !spirv.ptr<vector<4xi32>, Workgroup>
%660 = spirv.PtrAccessChain %659[%cst0_i32] : !spirv.ptr<vector<4xi32>, Workgroup>, i32
%661 = spirv.Bitcast %596 : !spirv.ptr<f16, CrossWorkgroup> to !spirv.ptr<vector<4xi32>, CrossWorkgroup>
spirv.BranchConditional %585, ^bb7, ^bb8(%384 : vector<4xi32>)
^bb7: // pred: ^bb6
%662 = spirv.Load "CrossWorkgroup" %661 : vector<4xi32>
spirv.Branch ^bb8(%662 : vector<4xi32>)
^bb8(%663: vector<4xi32>): // 2 preds: ^bb6, ^bb7
spirv.Store "Workgroup" %660, %663 : vector<4xi32>
%664 = spirv.CompositeExtract %420[0 : i32] : !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32, i32, i32)>
%665 = spirv.CompositeExtract %420[1 : i32] : !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32, i32, i32)>
%666 = spirv.CompositeExtract %420[2 : i32] : !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32, i32, i32)>
%667 = spirv.CompositeExtract %420[3 : i32] : !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32, i32, i32)>
%668 = spirv.IMul %586, %665 : i32
%669 = spirv.IAdd %cst0_i32, %668 : i32
%670 = spirv.IMul %cst0_i32, %666 : i32
%671 = spirv.IAdd %669, %670 : i32
%672 = spirv.IMul %cst0_i32, %667 : i32
%673 = spirv.IAdd %671, %672 : i32
%674 = spirv.PtrAccessChain %664[%673] : !spirv.ptr<f16, Workgroup>, i32
%675 = spirv.IMul %62, %666 : i32
%676 = spirv.IMul %372, %667 : i32
%677 = spirv.IAdd %675, %676 : i32
%678 = spirv.PtrAccessChain %674[%677] : !spirv.ptr<f16, Workgroup>, i32
%679 = spirv.PtrAccessChain %678[%670] : !spirv.ptr<f16, Workgroup>, i32
%680 = spirv.Bitcast %679 : !spirv.ptr<f16, Workgroup> to !spirv.ptr<vector<4xi32>, Workgroup>
%681 = spirv.PtrAccessChain %680[%cst0_i32] : !spirv.ptr<vector<4xi32>, Workgroup>, i32
%682 = spirv.Bitcast %620 : !spirv.ptr<f16, CrossWorkgroup> to !spirv.ptr<vector<4xi32>, CrossWorkgroup>
spirv.BranchConditional %585, ^bb9, ^bb10(%384 : vector<4xi32>)
^bb9: // pred: ^bb8
%683 = spirv.Load "CrossWorkgroup" %682 : vector<4xi32>
spirv.Branch ^bb10(%683 : vector<4xi32>)
^bb10(%684: vector<4xi32>): // 2 preds: ^bb8, ^bb9
spirv.Store "Workgroup" %681, %684 : vector<4xi32>
spirv.ControlBarrier <Workgroup>, <Workgroup>, <AcquireRelease|WorkgroupMemory>
%685 = spirv.CompositeExtract %419[4 : i32] : !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32, i32, i32)>
%686 = spirv.CompositeExtract %419[5 : i32] : !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32, i32, i32)>
%687 = spirv.CompositeExtract %419[6 : i32] : !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32, i32, i32)>
%688 = spirv.IMul %587, %644 : i32
%689 = spirv.IAdd %cst0_i32, %688 : i32
%690 = spirv.IAdd %689, %649 : i32
%691 = spirv.IAdd %690, %651 : i32
%692 = spirv.PtrAccessChain %643[%691] : !spirv.ptr<f16, Workgroup>, i32
%693 = spirv.CompositeInsert %692, %404[0 : i32] : !spirv.ptr<f16, Workgroup> into !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32)>
%694 = spirv.CompositeInsert %645, %693[1 : i32] : i32 into !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32)>
%695 = spirv.CompositeInsert %646, %694[2 : i32] : i32 into !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32)>
%696 = spirv.CompositeInsert %686, %695[3 : i32] : i32 into !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32)>
%697 = spirv.CompositeInsert %687, %696[4 : i32] : i32 into !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32)>
%698 = spirv.CompositeExtract %420[4 : i32] : !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32, i32, i32)>
%699 = spirv.CompositeExtract %420[5 : i32] : !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32, i32, i32)>
%700 = spirv.CompositeExtract %420[6 : i32] : !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32, i32, i32)>
%701 = spirv.IMul %587, %665 : i32
%702 = spirv.IAdd %cst0_i32, %701 : i32
%703 = spirv.IAdd %702, %670 : i32
%704 = spirv.IAdd %703, %672 : i32
%705 = spirv.PtrAccessChain %664[%704] : !spirv.ptr<f16, Workgroup>, i32
%706 = spirv.CompositeInsert %705, %404[0 : i32] : !spirv.ptr<f16, Workgroup> into !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32)>
%707 = spirv.CompositeInsert %666, %706[1 : i32] : i32 into !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32)>
%708 = spirv.CompositeInsert %667, %707[2 : i32] : i32 into !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32)>
%709 = spirv.CompositeInsert %699, %708[3 : i32] : i32 into !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32)>
%710 = spirv.CompositeInsert %700, %709[4 : i32] : i32 into !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32)>
%711 = spirv.IAdd %426, %cst1_i32 : i32
%712 = spirv.IAdd %427, %cst1_i32 : i32
%713 = spirv.IAdd %415, %cst1_i32 : i32
spirv.Branch ^bb5(%713, %535, %559, %583, %419, %420, %697, %710, %611, %635, %584, %711, %712 : i32, !spirv.struct<(f32, f32, f32, f32, f32, f32, f32, f32)>, !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>, !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>, !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32, i32, i32)>, !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32, i32, i32)>, !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32)>, !spirv.struct<(!spirv.ptr<f16, Workgroup>, i32, i32, i32, i32)>, !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>, !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>, i32, i32, i32)
^bb11: // pred: ^bb5
spirv.ControlBarrier <Workgroup>, <Workgroup>, <AcquireRelease|WorkgroupMemory>
%714 = spirv.CompositeExtract %416[0 : i32] : !spirv.struct<(f32, f32, f32, f32, f32, f32, f32, f32)>
%715 = spirv.CompositeExtract %416[1 : i32] : !spirv.struct<(f32, f32, f32, f32, f32, f32, f32, f32)>
%716 = spirv.CompositeExtract %416[2 : i32] : !spirv.struct<(f32, f32, f32, f32, f32, f32, f32, f32)>
%717 = spirv.CompositeExtract %416[3 : i32] : !spirv.struct<(f32, f32, f32, f32, f32, f32, f32, f32)>
%718 = spirv.CompositeExtract %416[4 : i32] : !spirv.struct<(f32, f32, f32, f32, f32, f32, f32, f32)>
%719 = spirv.CompositeExtract %416[5 : i32] : !spirv.struct<(f32, f32, f32, f32, f32, f32, f32, f32)>
%720 = spirv.CompositeExtract %416[6 : i32] : !spirv.struct<(f32, f32, f32, f32, f32, f32, f32, f32)>
%721 = spirv.CompositeExtract %416[7 : i32] : !spirv.struct<(f32, f32, f32, f32, f32, f32, f32, f32)>
%722 = spirv.FConvert %714 : f32 to f16
%723 = spirv.FConvert %715 : f32 to f16
%724 = spirv.FConvert %716 : f32 to f16
%725 = spirv.FConvert %717 : f32 to f16
%726 = spirv.FConvert %718 : f32 to f16
%727 = spirv.FConvert %719 : f32 to f16
%728 = spirv.FConvert %720 : f32 to f16
%729 = spirv.FConvert %721 : f32 to f16
%730 = spirv.Undef : !spirv.struct<(f16, f16, f16, f16, f16, f16, f16, f16)>
%731 = spirv.CompositeInsert %722, %730[0 : i32] : f16 into !spirv.struct<(f16, f16, f16, f16, f16, f16, f16, f16)>
%732 = spirv.CompositeInsert %723, %731[1 : i32] : f16 into !spirv.struct<(f16, f16, f16, f16, f16, f16, f16, f16)>
%733 = spirv.CompositeInsert %724, %732[2 : i32] : f16 into !spirv.struct<(f16, f16, f16, f16, f16, f16, f16, f16)>
%734 = spirv.CompositeInsert %725, %733[3 : i32] : f16 into !spirv.struct<(f16, f16, f16, f16, f16, f16, f16, f16)>
%735 = spirv.CompositeInsert %726, %734[4 : i32] : f16 into !spirv.struct<(f16, f16, f16, f16, f16, f16, f16, f16)>
%736 = spirv.CompositeInsert %727, %735[5 : i32] : f16 into !spirv.struct<(f16, f16, f16, f16, f16, f16, f16, f16)>
%737 = spirv.CompositeInsert %728, %736[6 : i32] : f16 into !spirv.struct<(f16, f16, f16, f16, f16, f16, f16, f16)>
%738 = spirv.CompositeInsert %135, %98[0 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%739 = spirv.CompositeInsert %135, %738[1 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%740 = spirv.CompositeInsert %135, %739[2 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%741 = spirv.CompositeInsert %135, %740[3 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%742 = spirv.CompositeInsert %135, %741[4 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%743 = spirv.CompositeInsert %135, %742[5 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%744 = spirv.CompositeInsert %135, %743[6 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%745 = spirv.CompositeInsert %arg8, %98[0 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%746 = spirv.CompositeInsert %arg8, %745[1 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%747 = spirv.CompositeInsert %arg8, %746[2 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%748 = spirv.CompositeInsert %arg8, %747[3 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%749 = spirv.CompositeInsert %arg8, %748[4 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%750 = spirv.CompositeInsert %arg8, %749[5 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%751 = spirv.CompositeInsert %arg8, %750[6 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%752 = spirv.IMul %135, %arg8 : i32
%753 = spirv.CompositeInsert %752, %98[0 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%754 = spirv.CompositeInsert %752, %753[1 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%755 = spirv.CompositeInsert %752, %754[2 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%756 = spirv.CompositeInsert %752, %755[3 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%757 = spirv.CompositeInsert %752, %756[4 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%758 = spirv.CompositeInsert %752, %757[5 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%759 = spirv.CompositeInsert %752, %758[6 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%760 = spirv.IAdd %752, %144 : i32
%761 = spirv.IAdd %752, %145 : i32
%762 = spirv.IAdd %752, %146 : i32
%763 = spirv.IAdd %752, %147 : i32
%764 = spirv.IAdd %752, %148 : i32
%765 = spirv.IAdd %752, %149 : i32
%766 = spirv.IAdd %752, %150 : i32
%767 = spirv.IAdd %752, %151 : i32
%768 = spirv.CompositeInsert %760, %98[0 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%769 = spirv.CompositeInsert %761, %768[1 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%770 = spirv.CompositeInsert %762, %769[2 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%771 = spirv.CompositeInsert %763, %770[3 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%772 = spirv.CompositeInsert %764, %771[4 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%773 = spirv.CompositeInsert %765, %772[5 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%774 = spirv.CompositeInsert %766, %773[6 : i32] : i32 into !spirv.struct<(i32, i32, i32, i32, i32, i32, i32, i32)>
%775 = spirv.CompositeInsert %arg2, %243[0 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%776 = spirv.CompositeInsert %arg2, %775[1 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%777 = spirv.CompositeInsert %arg2, %776[2 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%778 = spirv.CompositeInsert %arg2, %777[3 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%779 = spirv.CompositeInsert %arg2, %778[4 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%780 = spirv.CompositeInsert %arg2, %779[5 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%781 = spirv.CompositeInsert %arg2, %780[6 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%782 = spirv.PtrAccessChain %arg2[%760] : !spirv.ptr<f16, CrossWorkgroup>, i32
%783 = spirv.PtrAccessChain %arg2[%761] : !spirv.ptr<f16, CrossWorkgroup>, i32
%784 = spirv.PtrAccessChain %arg2[%762] : !spirv.ptr<f16, CrossWorkgroup>, i32
%785 = spirv.PtrAccessChain %arg2[%763] : !spirv.ptr<f16, CrossWorkgroup>, i32
%786 = spirv.PtrAccessChain %arg2[%764] : !spirv.ptr<f16, CrossWorkgroup>, i32
%787 = spirv.PtrAccessChain %arg2[%765] : !spirv.ptr<f16, CrossWorkgroup>, i32
%788 = spirv.PtrAccessChain %arg2[%766] : !spirv.ptr<f16, CrossWorkgroup>, i32
%789 = spirv.PtrAccessChain %arg2[%767] : !spirv.ptr<f16, CrossWorkgroup>, i32
%790 = spirv.CompositeInsert %782, %243[0 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%791 = spirv.CompositeInsert %783, %790[1 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%792 = spirv.CompositeInsert %784, %791[2 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%793 = spirv.CompositeInsert %785, %792[3 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%794 = spirv.CompositeInsert %786, %793[4 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%795 = spirv.CompositeInsert %787, %794[5 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%796 = spirv.CompositeInsert %788, %795[6 : i32] : !spirv.ptr<f16, CrossWorkgroup> into !spirv.struct<(!spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>, !spirv.ptr<f16, CrossWorkgroup>)>
%797 = spirv.SLessThan %135, %arg3 : i32
%798 = spirv.Undef : !spirv.struct<(i1)>
%799 = spirv.CompositeInsert %797, %348[0 : i32] : i1 into !spirv.struct<(i1, i1, i1, i1, i1, i1, i1, i1)>
%800 = spirv.CompositeInsert %797, %799[1 : i32] : i1 into !spirv.struct<(i1, i1, i1, i1, i1, i1, i1, i1)>
%801 = spirv.CompositeInsert %797, %800[2 : i32] : i1 into !spirv.struct<(i1, i1, i1, i1, i1, i1, i1, i1)>
%802 = spirv.CompositeInsert %797, %801[3 : i32] : i1 into !spirv.struct<(i1, i1, i1, i1, i1, i1, i1, i1)>
%803 = spirv.CompositeInsert %797, %802[4 : i32] : i1 into !spirv.struct<(i1, i1, i1, i1, i1, i1, i1, i1)>
%804 = spirv.CompositeInsert %797, %803[5 : i32] : i1 into !spirv.struct<(i1, i1, i1, i1, i1, i1, i1, i1)>
%805 = spirv.CompositeInsert %797, %804[6 : i32] : i1 into !spirv.struct<(i1, i1, i1, i1, i1, i1, i1, i1)>
%806 = spirv.SLessThan %144, %arg4 : i32
%807 = spirv.SLessThan %145, %arg4 : i32
%808 = spirv.SLessThan %146, %arg4 : i32
%809 = spirv.SLessThan %147, %arg4 : i32
%810 = spirv.SLessThan %148, %arg4 : i32
%811 = spirv.SLessThan %149, %arg4 : i32
%812 = spirv.SLessThan %150, %arg4 : i32
%813 = spirv.SLessThan %151, %arg4 : i32
%814 = spirv.CompositeInsert %806, %348[0 : i32] : i1 into !spirv.struct<(i1, i1, i1, i1, i1, i1, i1, i1)>
%815 = spirv.CompositeInsert %807, %814[1 : i32] : i1 into !spirv.struct<(i1, i1, i1, i1, i1, i1, i1, i1)>
%816 = spirv.CompositeInsert %808, %815[2 : i32] : i1 into !spirv.struct<(i1, i1, i1, i1, i1, i1, i1, i1)>
%817 = spirv.CompositeInsert %809, %816[3 : i32] : i1 into !spirv.struct<(i1, i1, i1, i1, i1, i1, i1, i1)>
%818 = spirv.CompositeInsert %810, %817[4 : i32] : i1 into !spirv.struct<(i1, i1, i1, i1, i1, i1, i1, i1)>
%819 = spirv.CompositeInsert %811, %818[5 : i32] : i1 into !spirv.struct<(i1, i1, i1, i1, i1, i1, i1, i1)>
%820 = spirv.CompositeInsert %812, %819[6 : i32] : i1 into !spirv.struct<(i1, i1, i1, i1, i1, i1, i1, i1)>
%821 = spirv.LogicalAnd %797, %806 : i1
%822 = spirv.LogicalAnd %797, %807 : i1
%823 = spirv.LogicalAnd %797, %808 : i1
%824 = spirv.LogicalAnd %797, %809 : i1
%825 = spirv.LogicalAnd %797, %810 : i1
%826 = spirv.LogicalAnd %797, %811 : i1
%827 = spirv.LogicalAnd %797, %812 : i1
%828 = spirv.LogicalAnd %797, %813 : i1
%829 = spirv.CompositeInsert %821, %348[0 : i32] : i1 into !spirv.struct<(i1, i1, i1, i1, i1, i1, i1, i1)>
%830 = spirv.CompositeInsert %822, %829[1 : i32] : i1 into !spirv.struct<(i1, i1, i1, i1, i1, i1, i1, i1)>
%831 = spirv.CompositeInsert %823, %830[2 : i32] : i1 into !spirv.struct<(i1, i1, i1, i1, i1, i1, i1, i1)>
%832 = spirv.CompositeInsert %824, %831[3 : i32] : i1 into !spirv.struct<(i1, i1, i1, i1, i1, i1, i1, i1)>
%833 = spirv.CompositeInsert %825, %832[4 : i32] : i1 into !spirv.struct<(i1, i1, i1, i1, i1, i1, i1, i1)>
%834 = spirv.CompositeInsert %826, %833[5 : i32] : i1 into !spirv.struct<(i1, i1, i1, i1, i1, i1, i1, i1)>
%835 = spirv.CompositeInsert %827, %834[6 : i32] : i1 into !spirv.struct<(i1, i1, i1, i1, i1, i1, i1, i1)>
%836 = spirv.Load "Input" %__builtin_var_LocalInvocationId___addr : vector<3xi64>
%837 = spirv.CompositeExtract %836[0 : i32] : vector<3xi64>
%838 = spirv.SConvert %837 : i64 to i32
%839 = spirv.UMod %838, %cst32_i32 : i32
%840 = spirv.UDiv %838, %cst32_i32 : i32
%841 = spirv.UMod %840, %cst1_i32 : i32
%842 = spirv.UDiv %840, %cst1_i32 : i32
%843 = spirv.UMod %842, %cst1_i32 : i32
%844 = spirv.UMod %843, %cst1_i32 : i32
%845 = spirv.UMod %841, %cst2_i32 : i32
%846 = spirv.UDiv %839, %cst4_i32 : i32
%847 = spirv.IAdd %846, %cst8_i32 : i32
%848 = spirv.UMod %839, %cst4_i32 : i32
%849 = spirv.IMul %848, %cst2_i32 : i32
%850 = spirv.IAdd %849, %cst1_i32 : i32
%851 = spirv.IMul %844, %cst16_i32 : i32
%852 = spirv.IAdd %846, %851 : i32
%853 = spirv.IMul %845, %cst8_i32 : i32
%854 = spirv.IAdd %849, %853 : i32
%cst24_i32 = spirv.Constant 24 : i32
%855 = spirv.IMul %852, %cst24_i32 : i32
%856 = spirv.IAdd %855, %854 : i32
%857 = spirv.PtrAccessChain %339[%856] : !spirv.ptr<f16, Workgroup>, i32
%858 = spirv.Bitcast %857 : !spirv.ptr<f16, Workgroup> to !spirv.ptr<vector<2xf16>, Workgroup>
%859 = spirv.Undef : vector<2xf16>
%cst0_i64 = spirv.Constant 0 : i64
%860 = spirv.VectorInsertDynamic %722, %859[%cst0_i64] : vector<2xf16>, i64
%cst1_i64 = spirv.Constant 1 : i64
%861 = spirv.VectorInsertDynamic %723, %860[%cst1_i64] : vector<2xf16>, i64
spirv.Store "Workgroup" %858, %861 : vector<2xf16>
%862 = spirv.Load "Input" %__builtin_var_LocalInvocationId___addr : vector<3xi64>
%863 = spirv.CompositeExtract %862[0 : i32] : vector<3xi64>
%864 = spirv.SConvert %863 : i64 to i32
%865 = spirv.UMod %864, %cst32_i32 : i32
%866 = spirv.UDiv %864, %cst32_i32 : i32
%867 = spirv.UMod %866, %cst1_i32 : i32
%868 = spirv.UDiv %866, %cst1_i32 : i32
%869 = spirv.UMod %868, %cst1_i32 : i32
%870 = spirv.UMod %869, %cst1_i32 : i32
%871 = spirv.UMod %867, %cst2_i32 : i32
%872 = spirv.UDiv %865, %cst4_i32 : i32
%873 = spirv.IAdd %872, %cst8_i32 : i32
%874 = spirv.UMod %865, %cst4_i32 : i32
%875 = spirv.IMul %874, %cst2_i32 : i32
%876 = spirv.IAdd %875, %cst1_i32 : i32
%877 = spirv.IMul %870, %cst16_i32 : i32
%878 = spirv.IAdd %873, %877 : i32
%879 = spirv.IMul %871, %cst8_i32 : i32
%880 = spirv.IAdd %875, %879 : i32
%881 = spirv.IMul %878, %cst24_i32 : i32
%882 = spirv.IAdd %881, %880 : i32
%883 = spirv.PtrAccessChain %339[%882] : !spirv.ptr<f16, Workgroup>, i32
%884 = spirv.Bitcast %883 : !spirv.ptr<f16, Workgroup> to !spirv.ptr<vector<2xf16>, Workgroup>
%885 = spirv.VectorInsertDynamic %724, %859[%cst0_i64] : vector<2xf16>, i64
%886 = spirv.VectorInsertDynamic %725, %885[%cst1_i64] : vector<2xf16>, i64
spirv.Store "Workgroup" %884, %886 : vector<2xf16>
%887 = spirv.Load "Input" %__builtin_var_LocalInvocationId___addr : vector<3xi64>
%888 = spirv.CompositeExtract %887[0 : i32] : vector<3xi64>
%889 = spirv.SConvert %888 : i64 to i32
%890 = spirv.UMod %889, %cst32_i32 : i32
%891 = spirv.UDiv %889, %cst32_i32 : i32
%892 = spirv.UMod %891, %cst1_i32 : i32
%893 = spirv.UDiv %891, %cst1_i32 : i32
%894 = spirv.UMod %893, %cst1_i32 : i32
%895 = spirv.UMod %894, %cst1_i32 : i32
%896 = spirv.UMod %892, %cst2_i32 : i32
%897 = spirv.UDiv %890, %cst4_i32 : i32
%898 = spirv.IAdd %897, %cst8_i32 : i32
%899 = spirv.UMod %890, %cst4_i32 : i32
%900 = spirv.IMul %899, %cst2_i32 : i32
%901 = spirv.IAdd %900, %cst1_i32 : i32
%902 = spirv.IMul %895, %cst16_i32 : i32
%903 = spirv.IAdd %897, %902 : i32
%904 = spirv.IMul %896, %cst8_i32 : i32
%905 = spirv.IAdd %900, %904 : i32
%906 = spirv.IAdd %905, %cst8_i32 : i32
%907 = spirv.IMul %903, %cst24_i32 : i32
%908 = spirv.IAdd %907, %906 : i32
%909 = spirv.PtrAccessChain %339[%908] : !spirv.ptr<f16, Workgroup>, i32
%910 = spirv.Bitcast %909 : !spirv.ptr<f16, Workgroup> to !spirv.ptr<vector<2xf16>, Workgroup>
%911 = spirv.VectorInsertDynamic %726, %859[%cst0_i64] : vector<2xf16>, i64
%912 = spirv.VectorInsertDynamic %727, %911[%cst1_i64] : vector<2xf16>, i64
spirv.Store "Workgroup" %910, %912 : vector<2xf16>
%913 = spirv.Load "Input" %__builtin_var_LocalInvocationId___addr : vector<3xi64>
%914 = spirv.CompositeExtract %913[0 : i32] : vector<3xi64>
%915 = spirv.SConvert %914 : i64 to i32
%916 = spirv.UMod %915, %cst32_i32 : i32
%917 = spirv.UDiv %915, %cst32_i32 : i32
%918 = spirv.UMod %917, %cst1_i32 : i32
%919 = spirv.UDiv %917, %cst1_i32 : i32
%920 = spirv.UMod %919, %cst1_i32 : i32
%921 = spirv.UMod %920, %cst1_i32 : i32
%922 = spirv.UMod %918, %cst2_i32 : i32
%923 = spirv.UDiv %916, %cst4_i32 : i32
%924 = spirv.IAdd %923, %cst8_i32 : i32
%925 = spirv.UMod %916, %cst4_i32 : i32
%926 = spirv.IMul %925, %cst2_i32 : i32
%927 = spirv.IAdd %926, %cst1_i32 : i32
%928 = spirv.IMul %921, %cst16_i32 : i32
%929 = spirv.IAdd %924, %928 : i32
%930 = spirv.IMul %922, %cst8_i32 : i32
%931 = spirv.IAdd %926, %930 : i32
%932 = spirv.IAdd %931, %cst8_i32 : i32
%933 = spirv.IMul %929, %cst24_i32 : i32
%934 = spirv.IAdd %933, %932 : i32
%935 = spirv.PtrAccessChain %339[%934] : !spirv.ptr<f16, Workgroup>, i32
%936 = spirv.Bitcast %935 : !spirv.ptr<f16, Workgroup> to !spirv.ptr<vector<2xf16>, Workgroup>
%937 = spirv.VectorInsertDynamic %728, %859[%cst0_i64] : vector<2xf16>, i64
%938 = spirv.VectorInsertDynamic %729, %937[%cst1_i64] : vector<2xf16>, i64
spirv.Store "Workgroup" %936, %938 : vector<2xf16>
spirv.ControlBarrier <Workgroup>, <Workgroup>, <AcquireRelease|WorkgroupMemory>
%939 = spirv.IMul %83, %cst24_i32 : i32
%940 = spirv.IAdd %939, %88 : i32
%941 = spirv.PtrAccessChain %339[%940] : !spirv.ptr<f16, Workgroup>, i32
%942 = spirv.Bitcast %941 : !spirv.ptr<f16, Workgroup> to !spirv.ptr<vector<8xf16>, Workgroup>
%943 = spirv.Load "Workgroup" %942 : vector<8xf16>
%944 = spirv.VectorExtractDynamic %943[%cst0_i64] : vector<8xf16>, i64
%945 = spirv.VectorExtractDynamic %943[%cst1_i64] : vector<8xf16>, i64
%cst2_i64 = spirv.Constant 2 : i64
%946 = spirv.VectorExtractDynamic %943[%cst2_i64] : vector<8xf16>, i64
%cst3_i64 = spirv.Constant 3 : i64
%947 = spirv.VectorExtractDynamic %943[%cst3_i64] : vector<8xf16>, i64
%cst4_i64 = spirv.Constant 4 : i64
%948 = spirv.VectorExtractDynamic %943[%cst4_i64] : vector<8xf16>, i64
%cst5_i64 = spirv.Constant 5 : i64
%949 = spirv.VectorExtractDynamic %943[%cst5_i64] : vector<8xf16>, i64
%cst6_i64 = spirv.Constant 6 : i64
%950 = spirv.VectorExtractDynamic %943[%cst6_i64] : vector<8xf16>, i64
%cst7_i64 = spirv.Constant 7 : i64
%951 = spirv.VectorExtractDynamic %943[%cst7_i64] : vector<8xf16>, i64
%952 = spirv.CompositeInsert %944, %730[0 : i32] : f16 into !spirv.struct<(f16, f16, f16, f16, f16, f16, f16, f16)>
%953 = spirv.CompositeInsert %945, %952[1 : i32] : f16 into !spirv.struct<(f16, f16, f16, f16, f16, f16, f16, f16)>
%954 = spirv.CompositeInsert %946, %953[2 : i32] : f16 into !spirv.struct<(f16, f16, f16, f16, f16, f16, f16, f16)>
%955 = spirv.CompositeInsert %947, %954[3 : i32] : f16 into !spirv.struct<(f16, f16, f16, f16, f16, f16, f16, f16)>
%956 = spirv.CompositeInsert %948, %955[4 : i32] : f16 into !spirv.struct<(f16, f16, f16, f16, f16, f16, f16, f16)>
%957 = spirv.CompositeInsert %949, %956[5 : i32] : f16 into !spirv.struct<(f16, f16, f16, f16, f16, f16, f16, f16)>
%958 = spirv.CompositeInsert %950, %957[6 : i32] : f16 into !spirv.struct<(f16, f16, f16, f16, f16, f16, f16, f16)>
%true = spirv.Constant true
%959 = spirv.Load "Input" %__builtin_var_LocalInvocationId___addr : vector<3xi64>
%960 = spirv.CompositeExtract %959[0 : i32] : vector<3xi64>
%961 = spirv.SConvert %960 : i64 to i32
%962 = spirv.UMod %961, %cst32_i32 : i32
%963 = spirv.UDiv %961, %cst32_i32 : i32
%964 = spirv.UDiv %963, %cst1_i32 : i32
%965 = spirv.UDiv %962, %cst2_i32 : i32
%966 = spirv.LogicalAnd %true, %821 : i1
spirv.BranchConditional %966, ^bb12, ^bb13
^bb12: // pred: ^bb11
%967 = spirv.VectorInsertDynamic %944, %859[%cst0_i32] : vector<2xf16>, i32
%968 = spirv.VectorInsertDynamic %945, %967[%cst1_i32] : vector<2xf16>, i32
%969 = spirv.Bitcast %968 : vector<2xf16> to i32
%970 = spirv.VectorInsertDynamic %969, %380[%cst0_i32] : vector<4xi32>, i32
%971 = spirv.VectorInsertDynamic %946, %859[%cst0_i32] : vector<2xf16>, i32
%972 = spirv.VectorInsertDynamic %947, %971[%cst1_i32] : vector<2xf16>, i32
%973 = spirv.Bitcast %972 : vector<2xf16> to i32
%974 = spirv.VectorInsertDynamic %973, %970[%cst1_i32] : vector<4xi32>, i32
%975 = spirv.VectorInsertDynamic %948, %859[%cst0_i32] : vector<2xf16>, i32
%976 = spirv.VectorInsertDynamic %949, %975[%cst1_i32] : vector<2xf16>, i32
%977 = spirv.Bitcast %976 : vector<2xf16> to i32
%978 = spirv.VectorInsertDynamic %977, %974[%cst2_i32] : vector<4xi32>, i32
%979 = spirv.VectorInsertDynamic %950, %859[%cst0_i32] : vector<2xf16>, i32
%980 = spirv.VectorInsertDynamic %951, %979[%cst1_i32] : vector<2xf16>, i32
%981 = spirv.Bitcast %980 : vector<2xf16> to i32
%982 = spirv.VectorInsertDynamic %981, %978[%cst3_i32] : vector<4xi32>, i32
%983 = spirv.Bitcast %782 : !spirv.ptr<f16, CrossWorkgroup> to !spirv.ptr<vector<4xi32>, CrossWorkgroup>
spirv.Store "CrossWorkgroup" %983, %982 : vector<4xi32>
spirv.Branch ^bb13
^bb13: // 2 preds: ^bb11, ^bb12
spirv.Return
}
}
But for some minor portion of the kernel, we might have to use the SIMD paradigm for performance or functionality. Like: we may want to use the VC intrinsic in the Triton kernel.
Could you please explain what is the exact use case for the "might have to use" scenario?
Also benchmark would be convincing. Like a SYCL example emulating the real use case, showing the benefit of using invoke_SIMD from SIMT code.
I am not sure how much the invoke_SIMD overhead is, whether it will make these type of mixing not very appealing.
But for some minor portion of the kernel, we might have to use the SIMD paradigm for performance or functionality. Like: we may want to use the VC intrinsic in the Triton kernel.
Could you please explain what is the exact use case for the "might have to use" scenario?
The SPIRV JointMatrixMatmul is hard to achieve best performance. We may need to explicitly to use the DPAS in the IR for pre-op and post-op fusing in GEMM.
But for some minor portion of the kernel, we might have to use the SIMD paradigm for performance or functionality. Like: we may want to use the VC intrinsic in the Triton kernel.
Could you please explain what is the exact use case for the "might have to use" scenario?
The SPIRV JointMatrixMatmul is hard to achieve best performance. We may need to explicitly to use the DPAS in the IR for pre-op and post-op fusing in GEMM.
I have concerns about the mix overhead.
But for some minor portion of the kernel, we might have to use the SIMD paradigm for performance or functionality. Like: we may want to use the VC intrinsic in the Triton kernel.
Could you please explain what is the exact use case for the "might have to use" scenario?
The SPIRV JointMatrixMatmul is hard to achieve best performance. We may need to explicitly to use the DPAS in the IR for pre-op and post-op fusing in GEMM.
I have concerns about the mix overhead.
Yeah. Based on the SYCL example. The SIMT-SIMD calling convention is not as good as expected. The IGC uses the register call for mixing the SIMT-SIMD functions.
call (16|M0) r127.0 L_f0__BB_0_0 {A@1} // $75
.....
L_f0__BB_0_0:
(W) mov (2|M0) r2.2<1>:ud r26.0<1;1,0>:ud // $1
(W) asr (1|M0) r2.1<1>:d r29.0<0;1,0>:d 31:w {Compacted} // $4
(W) shl (1|M0) r2.0<1>:d r29.0<0;1,0>:d 2:w {Compacted} // $7
(W) shr (1|M0) r2.4<1>:ud r29.0<0;1,0>:ud 0x1E:uw // $6
(W) shl (1|M0) r2.5<1>:d r2.1<0;1,0>:d 2:w {I@3} // $5
(W) addc (1|M0) r3.0<1>:ud r2.2<0;1,0>:ud r2.0<0;1,0>:ud {AccWrEn,I@3} // $9
(W) or (1|M0) r2.1<1>:d r2.5<0;1,0>:d r2.4<0;1,0>:d {I@2} // $8
(W) mov (1|M0) r5.0<1>:ud acc0.0<0;1,0>:ud {Compacted} // $9
(W) mov (1|M0) r4.0<1>:f r3.0<0;1,0>:f {Compacted,I@3} // $10
(W) add3 (1|M0) r4.1<1>:d r5.0<0;0>:d r2.3<0;0>:d r2.1<0>:d {I@1} // $11
(W) shr (1|M0) a0.2<1>:ud r126.7<0;1,0>:ud 0x4:ud {F@1} // $1
(W) send.dc1 (16|M0) r2 r4 null:0 0x0 0x022D0BFF {A@1,$0} // wr:1h+0, rd:2; a64 aligned oword block read x4 // $12
(W) add (1|M0) r126.0<1>:ud r127.2<0;1,0>:ud 0x0:ud {Compacted} // $1
(W) mov (4|M0) r59.4<1>:ud r127.0<1;1,0>:ud // save vISA SP/FP to temp; $1
(W) store.ugm.d32x8t.a32 (1|M0) ss[a0.2][r126:1] r127:1 {ExBSO,A@2,$1} // ex_desc:a0.2; desc:0x4200C504 // spill to FP[0*32] of ?; $1
(W) mov (1|M0) r127.3<1>:ud r127.2<0;1,0>:ud {$1.src} // vISA_FP = vISA_SP; $1
(W) add (1|M0) r127.2<1>:ud r127.2<0;1,0>:ud 0x40:ud // vISA_SP += vISA_frameSize; $1
(W) mov (4|M0) r127.0<1>:ud r59.4<1;1,0>:ud {I@3} // restore vISA SP/FP from temp; $15
(W) add (16|M0) r4.0<1>:f r2.0<1;1,0>:f r27.0<1;1,0>:f {Compacted,$0.dst} // $13
(W) add (16|M0) r26.0<1>:f r2.0<1;1,0>:f r27.0<1;1,0>:f {Compacted} // $14
ret (16|M0) r127.0 {A@1} // $15
This is not optimized for now. But I think it could be optimized at link phase by replacing the register function call to inline function call. And do some link phase optimization.
The SIMT-SIMD convention is a good mechanism for us to align our SIMT paradigm and SIMD paradigm. (Like: calling XeTLA micro kernel inside the Triton Kernel.)
Yes. We will enable this one. We would like to hide this within XeTile dialect as first step. Then we may need additional pass in the integration code (say Triton side) to merge multiple invoke_SIMD call into one.
In the future, when you say "performance not as expected", please report exact how much you expect, and how much it is currently. The benchmark should be close to real case as much as possible. For this time, we will build micro benchmark to track the XeTile level - like load/store/dpas of shapes with this numbers 8, 16, 24, 32, 64. and that will give us good understanding how the overhead is.
I have tried the patches for supporting this. We can close this issue when it is upstreamed.
Background
The Triton kernel is generated as SIMT major SPIRV kernel. It is because some component has to be used with SIMT paradigm. Like: Intel math library is only SIMT version. But for some minor portion of the kernel, we might have to use the SIMD paradigm for performance or functionality. Like: we may want to use the VC intrinsic in the Triton kernel.
We are working on enabling SIMT->SIMD calling convention on Triton kernel. https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_invoke_simd.asciidoc
By doing so, we can codegen SIMD paradigm code for parts of the kernel.
The requirements
We referred the SPIRV generated by the DPCPP which is SIMT+SIMD. We need to refer the SPIRV kernel function directly thru a function pointer. https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_function_pointers.asciidoc
We need the SPIRV dialect to support this.