intel / intel-graphics-compiler

Other
597 stars 155 forks source link

Abort passing ByVal OpTypePointer parameter in invalid storage class (sometimes) #162

Closed maleadt closed 2 years ago

maleadt commented 4 years ago

Using the Khronos LLVM to SPIRV translator with:

define dso_local spir_kernel void @_Z17julia_kernel_13195TupleI5Int64E([1 x i64]* nocapture nonnull readonly byval dereferenceable(8)) local_unnamed_addr {
top:
  ret void
}

yields:

               OpCapability Addresses
               OpCapability Kernel
               OpCapability Int64
          %1 = OpExtInstImport "OpenCL.std"
               OpMemoryModel Physical64 OpenCL
               OpEntryPoint Kernel %8 "_Z17julia_kernel_13055TupleI5Int64E"
               OpSource OpenCL_C 200000
               OpName %top "top"
               OpDecorate %9 FuncParamAttr ByVal
               OpDecorate %9 FuncParamAttr NoCapture
               OpDecorate %9 MaxByteOffset 8
      %ulong = OpTypeInt 64 0
    %ulong_1 = OpConstant %ulong 1
       %void = OpTypeVoid
%_arr_ulong_ulong_1 = OpTypeArray %ulong %ulong_1
%_ptr_Function__arr_ulong_ulong_1 = OpTypePointer Function %_arr_ulong_ulong_1
          %7 = OpTypeFunction %void %_ptr_Function__arr_ulong_ulong_1
          %8 = OpFunction %void None %7
          %9 = OpFunctionParameter %_ptr_Function__arr_ulong_ulong_1
        %top = OpLabel
               OpReturn
               OpFunctionEnd

The function parameter here is passed as a ByVal OpTypePointer Function. According to https://spec.oneapi.com/level-zero/latest/core/SPIRV.html#kernel-arguments the Function storage class is not supported, but that should not matter here as the parameter is passed ByVal (it would complicate my codegen if I have to rewrite all pointers to target a supported address space -- just attaching byval is much easier).

Abort was called at 563 line in file:
../level_zero/core/source/kernel/kernel_imp.cpp

signal (6): Aborted
in expression starting at /home/tim/Julia/pkg/oneAPI/wip.jl:5
gsignal at /usr/lib/libc.so.6 (unknown line)
abort at /usr/lib/libc.so.6 (unknown line)
NEO::abortExecution() at /home/tim/Julia/depot/artifacts/af8bb31099a874c0bcf6b7c9c56c167a5a5411d3/lib64/libze_intel_gpu.so.0.8 (unknown line)
NEO::abortUnrecoverable(int, char const*) at /home/tim/Julia/depot/artifacts/af8bb31099a874c0bcf6b7c9c56c167a5a5411d3/lib64/libze_intel_gpu.so.0.8 (unknown line)
L0::KernelImp::initialize(_ze_kernel_desc_t const*) at /home/tim/Julia/depot/artifacts/af8bb31099a874c0bcf6b7c9c56c167a5a5411d3/lib64/libze_intel_gpu.so.0.8 (unknown line)
L0::Kernel::create(unsigned int, L0::Module*, _ze_kernel_desc_t const*, _ze_result_t*) at /home/tim/Julia/depot/artifacts/af8bb31099a874c0bcf6b7c9c56c167a5a5411d3/lib64/libze_intel_gpu.so.0.8 (unknown line)
L0::ModuleImp::createKernel(_ze_kernel_desc_t const*, _ze_kernel_handle_t**) at /home/tim/Julia/depot/artifacts/af8bb31099a874c0bcf6b7c9c56c167a5a5411d3/lib64/libze_intel_gpu.so.0.8 (unknown line)
macro expansion at /home/tim/Julia/pkg/oneAPI/lib/level-zero/libze.jl:256 [inlined]
macro expansion at /home/tim/Julia/pkg/oneAPI/lib/level-zero/error.jl:107 [inlined]
zeKernelCreate at /home/tim/Julia/pkg/oneAPI/lib/utils/call.jl:24 [inlined]
ZeKernel at /home/tim/Julia/pkg/oneAPI/lib/level-zero/module.jl:78

This is on 20.24.17065, so https://github.com/intel/compute-runtime/blob/20.24.17065/level_zero/core/source/kernel/kernel_imp.cpp#L563

In some cases, passing such a pointer does work, so it does look like a bug. For example, passing a ByVal OpTypePointer Function that points to a struct containing a (supported) OpTypePointer CrossWorkgroup:

define dso_local spir_kernel void @_Z17julia_kernel_138114oneDeviceArrayI7Float32Li1ELi1EE({ [1 x i64], i8 addrspace(1)* }* nocapture nonnull readonly byval dereferenceable(16)) local_unnamed_addr {
top:
  ret void
}
               OpCapability Addresses
               OpCapability Kernel
               OpCapability Int64
               OpCapability Int8
          %1 = OpExtInstImport "OpenCL.std"
               OpMemoryModel Physical64 OpenCL
               OpEntryPoint Kernel %11 "_Z17julia_kernel_136714oneDeviceArrayI7Float32Li1ELi1EE"
               OpSource OpenCL_C 200000
               OpName %top "top"
               OpDecorate %12 FuncParamAttr ByVal
               OpDecorate %12 FuncParamAttr NoCapture
               OpDecorate %12 MaxByteOffset 16
      %ulong = OpTypeInt 64 0
      %uchar = OpTypeInt 8 0
    %ulong_1 = OpConstant %ulong 1
       %void = OpTypeVoid
%_arr_ulong_ulong_1 = OpTypeArray %ulong %ulong_1
%_ptr_CrossWorkgroup_uchar = OpTypePointer CrossWorkgroup %uchar
  %_struct_3 = OpTypeStruct %_arr_ulong_ulong_1 %_ptr_CrossWorkgroup_uchar
%_ptr_Function__struct_3 = OpTypePointer Function %_struct_3
         %10 = OpTypeFunction %void %_ptr_Function__struct_3
         %11 = OpFunction %void None %10
         %12 = OpFunctionParameter %_ptr_Function__struct_3
        %top = OpLabel
               OpReturn
               OpFunctionEnd
PawelJurek commented 4 years ago

@maleadt, I tried to compile the SPIR-V you provided with latest Neo release via ocloc (offline compilation) with following command line: ocloc -device kbl -spirv_input -file test.spv The build succeeded. Could you please try it on your machine?

maleadt commented 4 years ago

Using last week's 20.32.17625 I'm getting a return code of 250:

$ cat wip.spt
; SPIR-V
; Version: 1.1
; Generator: Khronos LLVM/SPIR-V Translator; 14
; Bound: 11
; Schema: 0
               OpCapability Addresses
               OpCapability Kernel
               OpCapability Int64
          %1 = OpExtInstImport "OpenCL.std"
               OpMemoryModel Physical64 OpenCL
               OpEntryPoint Kernel %8 "_Z16julia_kernel_9065TupleI5Int64E"
               OpSource OpenCL_C 200000
               OpName %top "top"
               OpDecorate %9 FuncParamAttr ByVal
               OpDecorate %9 FuncParamAttr NoCapture
               OpDecorate %9 MaxByteOffset 8
      %ulong = OpTypeInt 64 0
    %ulong_1 = OpConstant %ulong 1
       %void = OpTypeVoid
%_arr_ulong_ulong_1 = OpTypeArray %ulong %ulong_1
%_ptr_Function__arr_ulong_ulong_1 = OpTypePointer Function %_arr_ulong_ulong_1
          %7 = OpTypeFunction %void %_ptr_Function__arr_ulong_ulong_1
          %8 = OpFunction %void None %7
          %9 = OpFunctionParameter %_ptr_Function__arr_ulong_ulong_1
        %top = OpLabel
               OpReturn
               OpFunctionEnd

$ spirv-as wip.spt

$ /home/tim/Julia/depot/artifacts/a211fff3a2a941e2b6c9a616b201d57e3e2837c8/bin/ocloc -device kbl -spirv_input -file out.spv
Compilation from IR - skipping loading of FCL

$ echo $?
250

And still an ABORT when loading the module with libze:

Abort was called at 611 line in file:
../level_zero/core/source/kernel/kernel_imp.cpp

signal (6): Aborted
PawelJurek commented 4 years ago

I used 20.32.17625 as well and it works on my machine. I think the issue is caused by wrong libraries being loaded by ocloc. Did you actually install the driver with dpgk? If you just run ocloc from the directory, it won't take the libraries from there. You need to e.g. set LD_LIBRARY_PATH to this directory. To confirm that you take correct libraries, you can run it via gdb, this is the output on my machine:

gta@gtax-ubuntu-1804:~/test$ gdb --args ocloc -device kbl -spirv_input -file out.spv
Reading symbols from ocloc...
(No debugging symbols found in ocloc)
(gdb) r
Starting program: /usr/local/bin/ocloc -device kbl -spirv_input -file out.spv
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Compilation from IR - skipping loading of FCL
Build succeeded.
[Inferior 1 (process 1333) exited normally]
(gdb) i shared
From                To                  Syms Read   Shared Object Library
0x00007ffff7fd1100  0x00007ffff7ff23f4  Yes         /lib64/ld-linux-x86-64.so.2
0x00007ffff7d4db30  0x00007ffff7d8f530  Yes (*)     /usr/local/lib/libocloc.so
0x00007ffff7b6f670  0x00007ffff7ce474f  Yes         /lib/x86_64-linux-gnu/libc.so.6
0x00007ffff7b45220  0x00007ffff7b46189  Yes         /lib/x86_64-linux-gnu/libdl.so.2
0x00007ffff7b28b40  0x00007ffff7b385f5  Yes         /lib/x86_64-linux-gnu/libpthread.so.0
0x00007ffff79d0e90  0x00007ffff7ac5a62  Yes (*)     /usr/lib/x86_64-linux-gnu/libstdc++.so.6
0x00007ffff791c5e0  0x00007ffff792c9c5  Yes (*)     /lib/x86_64-linux-gnu/libgcc_s.so.1
0x00007ffff77d73c0  0x00007ffff787de78  Yes         /lib/x86_64-linux-gnu/libm.so.6
0x00007ffff4553f50  0x00007ffff5f1f5a0  Yes (*)     /usr/local/lib/libigc.so.1
0x00007ffff41ba260  0x00007ffff41cadab  Yes (*)     /lib/x86_64-linux-gnu/libz.so.1
0x00007ffff41b0720  0x00007ffff41b3dc0  Yes         /lib/x86_64-linux-gnu/librt.so.1
                                        No          linux-vdso.so.1
maleadt commented 4 years ago

I'm using Arch, but that doesn't matter since we build our own binaries which follow (exactly) the version recommendations as listed on the compute runtime releases page:

Does it matter that ocloc doesn't work, though? We exclusively use the Level Zero API to compile SPIR-V code, and the abort there seems unrelated to ocloc not working (although I agree it ought to work, and I am looking at it).

PawelJurek commented 4 years ago

Yes, I think we should check the ocloc issue first. I have seen such errors before when there was a mismatch between runtime and IGC libraries.

maleadt commented 4 years ago

OK, I'm in the process of upgrading to the latest stable version of all libraries. I'll provide a more extensive report here then.

maleadt commented 4 years ago

EDIT: none of this is relevant, see next post, but it might interest you anyway to see how we build our binaries.

Copying the release notes:

20.34.17727

Components revisions included in the release

intel/compute-runtime@20.34.17727
intel/gmmlib@intel-gmmlib-20.2.4
intel/intel-graphics-compiler@igc-1.0.4756

Additional components revisions used in build

intel/libva@c9bb65b (Compatible with va_api_major_version = 1)
intel/llvm-patches@c4a0345
intel/opencl-clang@6a9cd2c
KhronosGroup/SPIRV-LLVM-Translator@424e375 (for opencl-clang)
intel/vc-intrinsics@d7c5f99
KhronosGroup/SPIRV-LLVM-Translator@e87b59a (for vector compiler)
llvm/llvm-project@llvmorg-10.0.0
oneapi-src/level-zero@v1.0 (Compatible with oneAPI Level Zero API Spec v1.0.4)

gmmlib 20.2.4 build recipe: https://github.com/JuliaPackaging/Yggdrasil/blob/f1b0cc56376f3cd90a59cb02c7ca5d310845545a/G/gmmlib/build_tarballs.jl#L5-L12 uses the repo corresponding with v20.2.4, https://github.com/intel/gmmlib/commit/c882f7266bb25b3463635ab231078b049e62f7d1 resulting binary: https://github.com/JuliaBinaryWrappers/gmmlib_jll.jl/releases/download/gmmlib-v20.2.4%2B0/gmmlib.v20.2.4.x86_64-linux-gnu.tar.gz

igc 1.0.4756: https://github.com/JuliaPackaging/Yggdrasil/blob/f1b0cc56376f3cd90a59cb02c7ca5d310845545a/L/libigc/build_tarballs.jl#L5-L24 uses the repo corresponding with v1.0.4756, https://github.com/intel/intel-graphics-compiler/commit/3623209b10b357ddb3a3d6eac3551c53ebc897f7 other deps exactly ligned up as listed in the 'additional components' section of the compute runtime release resulting binary: https://github.com/JuliaBinaryWrappers/libigc_jll.jl/releases/download/libigc-v1.0.4756%2B0/libigc.v1.0.4756.x86_64-linux-gnu-cxx11.tar.gz

finally, compute-runtime 20.34.17727: https://github.com/JuliaPackaging/Yggdrasil/blob/3f672abb0962c2e32a3d956fc82ae9f9c93207a1/N/NEO/build_tarballs.jl#L5-L12 again, uses the repo at v20.34.17727: https://github.com/intel/compute-runtime/commit/49f450e09447099e052f7c462941cd58d4431c72 https://github.com/JuliaBinaryWrappers/NEO_jll.jl/releases/download/NEO-v20.34.17727%2B1/NEO.v20.34.17727.x86_64-linux-gnu-cxx11.tar.gz

You can easily use these libraries:

$ wget -P /tmp https://github.com/JuliaBinaryWrappers/NEO_jll.jl/releases/download/NEO-v20.34.17727%2B1/NEO.v20.34.17727.x86_64-linux-gnu-cxx11.tar.gz https://github.com/JuliaBinaryWrappers/gmmlib_jll.jl/releases/download/gmmlib-v20.2.4%2B0/gmmlib.v20.2.4.x86_64-linux-gnu.tar.gz https://github.com/JuliaBinaryWrappers/libigc_jll.jl/releases/download/libigc-v1.0.4756%2B0/libigc.v1.0.4756.x86_64-linux-gnu-cxx11.tar.gz
$ mkdir /tmp/sysroot
$ tar -xvf /tmp/gmmlib.v20.2.4.x86_64-linux-gnu.tar.gz -C /tmp/sysroot
$ tar -xvf /tmp/libigc.v1.0.4756.x86_64-linux-gnu-cxx11.tar.gz -C /tmp/sysroot
$ tar -xvf /tmp/NEO.v20.34.17727.x86_64-linux-gnu-cxx11.tar.gz -C /tmp/sysroot
$ LD_LIBRARY_PATH=/tmp/sysroot/lib:/tmp/sysroot/lib64 /tmp/sysroot/bin/ocloc -device kbl -spirv_input -file out.spv
Compilation from IR - skipping loading of FCL
$ echo $?
1

Running with LD_DEBUG=libs to show that they actually get loaded (I don't have another libigc/gmmlib/compute-runtime installed locally):

    262615:     find library=libocloc.so [0]; searching
    262615:       trying file=/tmp/sysroot/lib64/libocloc.so
Compilation from IR - skipping loading of FCL
    262615:     find library=libigc.so.1 [0]; searching
    262615:       trying file=/tmp/sysroot/lib64/libigc.so.1

So with everything matched up, ocloc still fails. Level Zero also still aborts.

maleadt commented 4 years ago

Turns out the ocloc failure is caused by the SPIRV Tools changing the generated image in a way that ocloc doesn't like. Dumping the output we get from llvm-spirv directly I get something that (our build of) ocloc happily assembles:

$ LD_LIBRARY_PATH=/tmp/sysroot/lib:/tmp/sysroot/lib64 /tmp/sysroot/bin/ocloc -device kbl -spirv_input -file /tmp/original.spv
Compilation from IR - skipping loading of FCL
Build succeeded.

Now, how I got a broken image: When we want to access textual SPIRV, we don't have llvm-spirv generate textual code, but rather disassemble an image using spirv-dis since it gives much nicer output. Reassembling that output gives an image that ocloc can't handle. I uploaded the files here: https://gist.github.com/maleadt/bc161bedb0c6bccaf977fc58045cab17

Should I open a separate issue for this?

PawelJurek commented 4 years ago

No, I think we can keep it in this issue. Just to confirm: the original.spv compiles fine with ocloc but crashes when compiled via L0 runtime calls?

maleadt commented 4 years ago

Correct. Just to demonstrate that this is really with the same input and both ocloc and the L0 implementation from the same compute-runtime build:

julia> using oneAPI, oneAPI.oneL0

julia> oneAPI.oneL0.NEO_jll.ocloc() do path
       run(`$path -device kbl -spirv_input -file original.spv`)
       end
Compilation from IR - skipping loading of FCL
Build succeeded.
Process(`/home/tim/Julia/depot/artifacts/e4613c1caa40c301a1ee2eaed0fcca3579fe3744/bin/ocloc -device kbl -spirv_input -file original.spv`, ProcessExited(0))

julia> drv = first(drivers())
ZeDriver(0x0000000001dfe830, version 1.0.0)

julia> dev = first(devices(drv))
ZeDevice(GPU, vendor 0x8086, device 0x1912): Intel(R) Gen9

julia> ctx = context()
ZeContext(Ptr{Nothing} @0x0000000003269670, ZeDriver(Ptr{Nothing} @0x0000000001dfe830))

julia> image = read("original.spv");

# call zeModuleCreate
julia> m = ZeModule(ctx, dev, image)
ZeModule(Ptr{Nothing} @0x0000000003c9c0a0, ZeContext(Ptr{Nothing} @0x0000000003269670, ZeDriver(Ptr{Nothing} @0x0000000001dfe830)), ZeDevice(Ptr{Nothing} @0x00000000019a6890, ZeDriver(Ptr{Nothing} @0x0000000001dfe830)))

# call zeKernelCreate
julia> kernels(m)
Abort was called at 611 line in file:
../level_zero/core/source/kernel/kernel_imp.cpp
maleadt commented 4 years ago

Trying in a C++ environment, same thing:

#include <cstdio>
#include <cstdlib>
#include <level_zero/ze_api.h>

#define BUFFERSIZE 1024

#define check(ans)                                                             \
  { do_check((ans), __FILE__, __LINE__); }
void do_check(ze_result_t code, const char *file, int line) {
  if (code != ZE_RESULT_SUCCESS) {
    fprintf(stderr, "Failed: %d at %s %d\n", code, file, line);
    exit(1);
  }
}

int main() {
  // Initialize driver
  check(zeInit(ZE_INIT_FLAG_GPU_ONLY));

  // Retrieve driver
  uint32_t driverCount = 0;
  check(zeDriverGet(&driverCount, nullptr));

  ze_driver_handle_t driverHandle;
  check(zeDriverGet(&driverCount, &driverHandle));

  ze_context_desc_t contextDesc = {};
  ze_context_handle_t context;
  check(zeContextCreate(driverHandle, &contextDesc, &context));

  // Retrieve device
  uint32_t deviceCount = 0;
  check(zeDeviceGet(driverHandle, &deviceCount, nullptr));

  ze_device_handle_t device;
  deviceCount = 1;
  check(zeDeviceGet(driverHandle, &deviceCount, &device));

  uint8_t *buffer = (uint8_t *)malloc(BUFFERSIZE);
  FILE *filp = fopen("original.spv", "rb");
  size_t bytes_read = fread(buffer, sizeof(uint8_t), BUFFERSIZE, filp);

  ze_module_desc_t module_desc = {};
  module_desc.format = ZE_MODULE_FORMAT_IL_SPIRV;
  module_desc.inputSize = bytes_read;
  module_desc.pInputModule = buffer;
  ze_module_handle_t module;
  check(zeModuleCreate(context, device, &module_desc, &module, nullptr));

  ze_kernel_desc_t kernel_desc = {};
  kernel_desc.pKernelName = "_Z16julia_kernel_9195TupleI5Int64E";
  ze_kernel_handle_t kernel;
  check(zeKernelCreate(module, &kernel_desc, &kernel));
}

With original.spv from https://gist.github.com/maleadt/bc161bedb0c6bccaf977fc58045cab17:

Abort was called at 611 line in file:
../level_zero/core/source/kernel/kernel_imp.cpp

To use with exactly the same binaries, see above or https://github.com/intel/compute-runtime/issues/320#issuecomment-692630492.

PawelJurek commented 4 years ago

Thank you @maleadt, I will take a look, sorry for the delay.

PawelJurek commented 3 years ago

I was able to reproduce the issue. It looks like the L0 runtime doesn't recognize the argument type in the kernel. I'm working with the runtime team to figure out if it expects different output from the compiler or is this some other issue.

maleadt commented 3 years ago

Note that I switched to emitting byval pointers because the alternative doesn't work either. For example, taking the example from the issue but passing the array as a value:

define spir_kernel void @_Z16julia_kernel_9325TupleI5Int64E([1 x i64]) local_unnamed_addr {
entry:
  ret void
}
; SPIR-V
; Version: 1.0
; Generator: Khronos LLVM/SPIR-V Translator; 14
; Bound: 10
; Schema: 0
               OpCapability Addresses
               OpCapability Kernel
               OpCapability Int64
          %1 = OpExtInstImport "OpenCL.std"
               OpMemoryModel Physical64 OpenCL
               OpEntryPoint Kernel %7 "_Z16julia_kernel_9535TupleI5Int64E"
               OpSource OpenCL_C 200000
               OpName %entry "entry"
      %ulong = OpTypeInt 64 0
    %ulong_1 = OpConstant %ulong 1
       %void = OpTypeVoid
%_arr_ulong_ulong_1 = OpTypeArray %ulong %ulong_1
          %6 = OpTypeFunction %void %_arr_ulong_ulong_1
          %7 = OpFunction %void None %6
          %8 = OpFunctionParameter %_arr_ulong_ulong_1
      %entry = OpLabel
               OpReturn
               OpFunctionEnd

Loading this code with oneAPI gives a module that contains no kernels whatsoever. Furthermore, I remember running into some issues with Khronos' LLVM to SPIR-V translator which seemed to indicate using byval was recommended. It's also much easier for the front-end to be able to use this LLVM and SPIR-V feature instead of having to rewrite IR to pass arguments by value.

PawelJurek commented 3 years ago

It is probable, that this issue occurs because the argument is not used in the kernel. @maleadt, while I'm debugging the issue in the compiler, could you please try adding some valid usage of the argument into the kernel and see if it works? Ideally, this would be e.g. a store to some buffer, so that this code is not dead-code-eliminated.

PawelJurek commented 3 years ago

I found the issue in the compiler: it seems that only struct arguments are currently supported for byval. So for a quick workaround you could wrap the array in a struct, e.g.:

; SPIR-V
; Version: 1.0
; Generator: Khronos LLVM/SPIR-V Translator; 14
; Bound: 14
; Schema: 0
               OpCapability Addresses ; 0x00000014
               OpCapability Kernel ; 0x0000001c
               OpCapability Int64 ; 0x00000024
          %1 = OpExtInstImport "OpenCL.std" ; 0x0000002c
               OpMemoryModel Physical64 OpenCL ; 0x00000040
               OpEntryPoint Kernel %10 "test" ; 0x0000004c
         %13 = OpString "kernel_arg_type.test.struct pj," ; 0x00000060
               OpSource OpenCL_C 102000 ; 0x00000088
               OpName %struct_pj "struct.pj" ; 0x00000094
               OpName %arg "arg" ; 0x000000a8
               OpName %entry "entry" ; 0x000000b4
               OpDecorate %arg FuncParamAttr ByVal ; 0x000000c4
       %uint = OpTypeInt 32 0 ; 0x000000d4
      %ulong = OpTypeInt 64 0 ; 0x000000e4
    %ulong_1 = OpConstant %ulong 1 ; 0x000000f4
       %void = OpTypeVoid ; 0x00000108
%_arr_uint_ulong_1 = OpTypeArray %uint %ulong_1 ; 0x00000110
  %struct_pj = OpTypeStruct %_arr_uint_ulong_1 ; 0x00000120
%_ptr_Function_struct_pj = OpTypePointer Function %struct_pj ; 0x0000012c
          %9 = OpTypeFunction %void %_ptr_Function_struct_pj ; 0x0000013c
         %10 = OpFunction %void None %9 ; 0x0000014c
        %arg = OpFunctionParameter %_ptr_Function_struct_pj ; 0x00000160
      %entry = OpLabel ; 0x0000016c
               OpReturn ; 0x00000174
               OpFunctionEnd ; 0x00000178
PawelJurek commented 3 years ago

@maleadt, could you please let us know if it is feasible for you to implement this workaround to resolve this issue? The reason why it was not supported, is because such SPIR-V wouldn't be produced by Clang for OpenCL case. The only valid case for byval pointer value there is struct param.

maleadt commented 3 years ago

Wrapping in a { ... } byval seems to work, but has exposed other LLVM-related issues. So it's a good workaround for now, but it would still be nice to have full byval support at some point.

JacekDanecki commented 3 years ago

I'm transferring it to IGC project: https://github.com/intel/intel-graphics-compiler

mnaczk commented 2 years ago

Thanks to the CPP reproducer. https://github.com/intel/intel-graphics-compiler/issues/162#issuecomment-692715577 I was able to reproduce the error on NEO release 20.34.17727 On the newest NEO release 22.23.23405 issue is not reproducible anymore. CPP reproducer does not report any error on the newest NEO release.

Could you check if the error still occurs on the newest NEO release? If not please close the issue.

maleadt commented 2 years ago

Removing my workaround still results in miscompilations (tested using NEO 22.25.23529). I don't have the time to investigate further now, I'll report back when I've had a look.