Closed maleadt closed 2 years ago
@maleadt, I tried to compile the SPIR-V you provided with latest Neo release via ocloc (offline compilation) with following command line:
ocloc -device kbl -spirv_input -file test.spv
The build succeeded. Could you please try it on your machine?
Using last week's 20.32.17625 I'm getting a return code of 250:
$ cat wip.spt
; SPIR-V
; Version: 1.1
; Generator: Khronos LLVM/SPIR-V Translator; 14
; Bound: 11
; Schema: 0
OpCapability Addresses
OpCapability Kernel
OpCapability Int64
%1 = OpExtInstImport "OpenCL.std"
OpMemoryModel Physical64 OpenCL
OpEntryPoint Kernel %8 "_Z16julia_kernel_9065TupleI5Int64E"
OpSource OpenCL_C 200000
OpName %top "top"
OpDecorate %9 FuncParamAttr ByVal
OpDecorate %9 FuncParamAttr NoCapture
OpDecorate %9 MaxByteOffset 8
%ulong = OpTypeInt 64 0
%ulong_1 = OpConstant %ulong 1
%void = OpTypeVoid
%_arr_ulong_ulong_1 = OpTypeArray %ulong %ulong_1
%_ptr_Function__arr_ulong_ulong_1 = OpTypePointer Function %_arr_ulong_ulong_1
%7 = OpTypeFunction %void %_ptr_Function__arr_ulong_ulong_1
%8 = OpFunction %void None %7
%9 = OpFunctionParameter %_ptr_Function__arr_ulong_ulong_1
%top = OpLabel
OpReturn
OpFunctionEnd
$ spirv-as wip.spt
$ /home/tim/Julia/depot/artifacts/a211fff3a2a941e2b6c9a616b201d57e3e2837c8/bin/ocloc -device kbl -spirv_input -file out.spv
Compilation from IR - skipping loading of FCL
$ echo $?
250
And still an ABORT when loading the module with libze
:
Abort was called at 611 line in file:
../level_zero/core/source/kernel/kernel_imp.cpp
signal (6): Aborted
I used 20.32.17625 as well and it works on my machine. I think the issue is caused by wrong libraries being loaded by ocloc. Did you actually install the driver with dpgk? If you just run ocloc from the directory, it won't take the libraries from there. You need to e.g. set LD_LIBRARY_PATH to this directory. To confirm that you take correct libraries, you can run it via gdb, this is the output on my machine:
gta@gtax-ubuntu-1804:~/test$ gdb --args ocloc -device kbl -spirv_input -file out.spv
Reading symbols from ocloc...
(No debugging symbols found in ocloc)
(gdb) r
Starting program: /usr/local/bin/ocloc -device kbl -spirv_input -file out.spv
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Compilation from IR - skipping loading of FCL
Build succeeded.
[Inferior 1 (process 1333) exited normally]
(gdb) i shared
From To Syms Read Shared Object Library
0x00007ffff7fd1100 0x00007ffff7ff23f4 Yes /lib64/ld-linux-x86-64.so.2
0x00007ffff7d4db30 0x00007ffff7d8f530 Yes (*) /usr/local/lib/libocloc.so
0x00007ffff7b6f670 0x00007ffff7ce474f Yes /lib/x86_64-linux-gnu/libc.so.6
0x00007ffff7b45220 0x00007ffff7b46189 Yes /lib/x86_64-linux-gnu/libdl.so.2
0x00007ffff7b28b40 0x00007ffff7b385f5 Yes /lib/x86_64-linux-gnu/libpthread.so.0
0x00007ffff79d0e90 0x00007ffff7ac5a62 Yes (*) /usr/lib/x86_64-linux-gnu/libstdc++.so.6
0x00007ffff791c5e0 0x00007ffff792c9c5 Yes (*) /lib/x86_64-linux-gnu/libgcc_s.so.1
0x00007ffff77d73c0 0x00007ffff787de78 Yes /lib/x86_64-linux-gnu/libm.so.6
0x00007ffff4553f50 0x00007ffff5f1f5a0 Yes (*) /usr/local/lib/libigc.so.1
0x00007ffff41ba260 0x00007ffff41cadab Yes (*) /lib/x86_64-linux-gnu/libz.so.1
0x00007ffff41b0720 0x00007ffff41b3dc0 Yes /lib/x86_64-linux-gnu/librt.so.1
No linux-vdso.so.1
I'm using Arch, but that doesn't matter since we build our own binaries which follow (exactly) the version recommendations as listed on the compute runtime releases page:
Does it matter that ocloc
doesn't work, though? We exclusively use the Level Zero API to compile SPIR-V code, and the abort there seems unrelated to ocloc
not working (although I agree it ought to work, and I am looking at it).
Yes, I think we should check the ocloc issue first. I have seen such errors before when there was a mismatch between runtime and IGC libraries.
OK, I'm in the process of upgrading to the latest stable version of all libraries. I'll provide a more extensive report here then.
EDIT: none of this is relevant, see next post, but it might interest you anyway to see how we build our binaries.
Copying the release notes:
20.34.17727
Components revisions included in the release
intel/compute-runtime@20.34.17727 intel/gmmlib@intel-gmmlib-20.2.4 intel/intel-graphics-compiler@igc-1.0.4756
Additional components revisions used in build
intel/libva@c9bb65b (Compatible with va_api_major_version = 1) intel/llvm-patches@c4a0345 intel/opencl-clang@6a9cd2c KhronosGroup/SPIRV-LLVM-Translator@424e375 (for opencl-clang) intel/vc-intrinsics@d7c5f99 KhronosGroup/SPIRV-LLVM-Translator@e87b59a (for vector compiler) llvm/llvm-project@llvmorg-10.0.0 oneapi-src/level-zero@v1.0 (Compatible with oneAPI Level Zero API Spec v1.0.4)
gmmlib 20.2.4 build recipe: https://github.com/JuliaPackaging/Yggdrasil/blob/f1b0cc56376f3cd90a59cb02c7ca5d310845545a/G/gmmlib/build_tarballs.jl#L5-L12 uses the repo corresponding with v20.2.4, https://github.com/intel/gmmlib/commit/c882f7266bb25b3463635ab231078b049e62f7d1 resulting binary: https://github.com/JuliaBinaryWrappers/gmmlib_jll.jl/releases/download/gmmlib-v20.2.4%2B0/gmmlib.v20.2.4.x86_64-linux-gnu.tar.gz
igc 1.0.4756: https://github.com/JuliaPackaging/Yggdrasil/blob/f1b0cc56376f3cd90a59cb02c7ca5d310845545a/L/libigc/build_tarballs.jl#L5-L24 uses the repo corresponding with v1.0.4756, https://github.com/intel/intel-graphics-compiler/commit/3623209b10b357ddb3a3d6eac3551c53ebc897f7 other deps exactly ligned up as listed in the 'additional components' section of the compute runtime release resulting binary: https://github.com/JuliaBinaryWrappers/libigc_jll.jl/releases/download/libigc-v1.0.4756%2B0/libigc.v1.0.4756.x86_64-linux-gnu-cxx11.tar.gz
finally, compute-runtime 20.34.17727: https://github.com/JuliaPackaging/Yggdrasil/blob/3f672abb0962c2e32a3d956fc82ae9f9c93207a1/N/NEO/build_tarballs.jl#L5-L12 again, uses the repo at v20.34.17727: https://github.com/intel/compute-runtime/commit/49f450e09447099e052f7c462941cd58d4431c72 https://github.com/JuliaBinaryWrappers/NEO_jll.jl/releases/download/NEO-v20.34.17727%2B1/NEO.v20.34.17727.x86_64-linux-gnu-cxx11.tar.gz
You can easily use these libraries:
$ wget -P /tmp https://github.com/JuliaBinaryWrappers/NEO_jll.jl/releases/download/NEO-v20.34.17727%2B1/NEO.v20.34.17727.x86_64-linux-gnu-cxx11.tar.gz https://github.com/JuliaBinaryWrappers/gmmlib_jll.jl/releases/download/gmmlib-v20.2.4%2B0/gmmlib.v20.2.4.x86_64-linux-gnu.tar.gz https://github.com/JuliaBinaryWrappers/libigc_jll.jl/releases/download/libigc-v1.0.4756%2B0/libigc.v1.0.4756.x86_64-linux-gnu-cxx11.tar.gz
$ mkdir /tmp/sysroot
$ tar -xvf /tmp/gmmlib.v20.2.4.x86_64-linux-gnu.tar.gz -C /tmp/sysroot
$ tar -xvf /tmp/libigc.v1.0.4756.x86_64-linux-gnu-cxx11.tar.gz -C /tmp/sysroot
$ tar -xvf /tmp/NEO.v20.34.17727.x86_64-linux-gnu-cxx11.tar.gz -C /tmp/sysroot
$ LD_LIBRARY_PATH=/tmp/sysroot/lib:/tmp/sysroot/lib64 /tmp/sysroot/bin/ocloc -device kbl -spirv_input -file out.spv
Compilation from IR - skipping loading of FCL
$ echo $?
1
Running with LD_DEBUG=libs
to show that they actually get loaded (I don't have another libigc/gmmlib/compute-runtime installed locally):
262615: find library=libocloc.so [0]; searching
262615: trying file=/tmp/sysroot/lib64/libocloc.so
Compilation from IR - skipping loading of FCL
262615: find library=libigc.so.1 [0]; searching
262615: trying file=/tmp/sysroot/lib64/libigc.so.1
So with everything matched up, ocloc
still fails. Level Zero also still aborts.
Turns out the ocloc
failure is caused by the SPIRV Tools changing the generated image in a way that ocloc
doesn't like. Dumping the output we get from llvm-spirv
directly I get something that (our build of) ocloc happily assembles:
$ LD_LIBRARY_PATH=/tmp/sysroot/lib:/tmp/sysroot/lib64 /tmp/sysroot/bin/ocloc -device kbl -spirv_input -file /tmp/original.spv
Compilation from IR - skipping loading of FCL
Build succeeded.
Now, how I got a broken image: When we want to access textual SPIRV, we don't have llvm-spirv
generate textual code, but rather disassemble an image using spirv-dis
since it gives much nicer output. Reassembling that output gives an image that ocloc
can't handle. I uploaded the files here: https://gist.github.com/maleadt/bc161bedb0c6bccaf977fc58045cab17
original.spv
: original image that gets generated by llvm-spirv
(and aborts the compute runtime)debugless.spv
: the same, without debug info that spirv-dis
can't handledebugless.spt
: disassembled with spirv-dis
spirv-as
, resulting in a binary that's nearly identical to original.spv
, but that ocloc
doesn't likeShould I open a separate issue for this?
No, I think we can keep it in this issue. Just to confirm: the original.spv compiles fine with ocloc but crashes when compiled via L0 runtime calls?
Correct. Just to demonstrate that this is really with the same input and both ocloc
and the L0 implementation from the same compute-runtime build:
julia> using oneAPI, oneAPI.oneL0
julia> oneAPI.oneL0.NEO_jll.ocloc() do path
run(`$path -device kbl -spirv_input -file original.spv`)
end
Compilation from IR - skipping loading of FCL
Build succeeded.
Process(`/home/tim/Julia/depot/artifacts/e4613c1caa40c301a1ee2eaed0fcca3579fe3744/bin/ocloc -device kbl -spirv_input -file original.spv`, ProcessExited(0))
julia> drv = first(drivers())
ZeDriver(0x0000000001dfe830, version 1.0.0)
julia> dev = first(devices(drv))
ZeDevice(GPU, vendor 0x8086, device 0x1912): Intel(R) Gen9
julia> ctx = context()
ZeContext(Ptr{Nothing} @0x0000000003269670, ZeDriver(Ptr{Nothing} @0x0000000001dfe830))
julia> image = read("original.spv");
# call zeModuleCreate
julia> m = ZeModule(ctx, dev, image)
ZeModule(Ptr{Nothing} @0x0000000003c9c0a0, ZeContext(Ptr{Nothing} @0x0000000003269670, ZeDriver(Ptr{Nothing} @0x0000000001dfe830)), ZeDevice(Ptr{Nothing} @0x00000000019a6890, ZeDriver(Ptr{Nothing} @0x0000000001dfe830)))
# call zeKernelCreate
julia> kernels(m)
Abort was called at 611 line in file:
../level_zero/core/source/kernel/kernel_imp.cpp
Trying in a C++ environment, same thing:
#include <cstdio>
#include <cstdlib>
#include <level_zero/ze_api.h>
#define BUFFERSIZE 1024
#define check(ans) \
{ do_check((ans), __FILE__, __LINE__); }
void do_check(ze_result_t code, const char *file, int line) {
if (code != ZE_RESULT_SUCCESS) {
fprintf(stderr, "Failed: %d at %s %d\n", code, file, line);
exit(1);
}
}
int main() {
// Initialize driver
check(zeInit(ZE_INIT_FLAG_GPU_ONLY));
// Retrieve driver
uint32_t driverCount = 0;
check(zeDriverGet(&driverCount, nullptr));
ze_driver_handle_t driverHandle;
check(zeDriverGet(&driverCount, &driverHandle));
ze_context_desc_t contextDesc = {};
ze_context_handle_t context;
check(zeContextCreate(driverHandle, &contextDesc, &context));
// Retrieve device
uint32_t deviceCount = 0;
check(zeDeviceGet(driverHandle, &deviceCount, nullptr));
ze_device_handle_t device;
deviceCount = 1;
check(zeDeviceGet(driverHandle, &deviceCount, &device));
uint8_t *buffer = (uint8_t *)malloc(BUFFERSIZE);
FILE *filp = fopen("original.spv", "rb");
size_t bytes_read = fread(buffer, sizeof(uint8_t), BUFFERSIZE, filp);
ze_module_desc_t module_desc = {};
module_desc.format = ZE_MODULE_FORMAT_IL_SPIRV;
module_desc.inputSize = bytes_read;
module_desc.pInputModule = buffer;
ze_module_handle_t module;
check(zeModuleCreate(context, device, &module_desc, &module, nullptr));
ze_kernel_desc_t kernel_desc = {};
kernel_desc.pKernelName = "_Z16julia_kernel_9195TupleI5Int64E";
ze_kernel_handle_t kernel;
check(zeKernelCreate(module, &kernel_desc, &kernel));
}
With original.spv
from https://gist.github.com/maleadt/bc161bedb0c6bccaf977fc58045cab17:
Abort was called at 611 line in file:
../level_zero/core/source/kernel/kernel_imp.cpp
To use with exactly the same binaries, see above or https://github.com/intel/compute-runtime/issues/320#issuecomment-692630492.
Thank you @maleadt, I will take a look, sorry for the delay.
I was able to reproduce the issue. It looks like the L0 runtime doesn't recognize the argument type in the kernel. I'm working with the runtime team to figure out if it expects different output from the compiler or is this some other issue.
Note that I switched to emitting byval
pointers because the alternative doesn't work either. For example, taking the example from the issue but passing the array as a value:
define spir_kernel void @_Z16julia_kernel_9325TupleI5Int64E([1 x i64]) local_unnamed_addr {
entry:
ret void
}
; SPIR-V
; Version: 1.0
; Generator: Khronos LLVM/SPIR-V Translator; 14
; Bound: 10
; Schema: 0
OpCapability Addresses
OpCapability Kernel
OpCapability Int64
%1 = OpExtInstImport "OpenCL.std"
OpMemoryModel Physical64 OpenCL
OpEntryPoint Kernel %7 "_Z16julia_kernel_9535TupleI5Int64E"
OpSource OpenCL_C 200000
OpName %entry "entry"
%ulong = OpTypeInt 64 0
%ulong_1 = OpConstant %ulong 1
%void = OpTypeVoid
%_arr_ulong_ulong_1 = OpTypeArray %ulong %ulong_1
%6 = OpTypeFunction %void %_arr_ulong_ulong_1
%7 = OpFunction %void None %6
%8 = OpFunctionParameter %_arr_ulong_ulong_1
%entry = OpLabel
OpReturn
OpFunctionEnd
Loading this code with oneAPI gives a module that contains no kernels whatsoever. Furthermore, I remember running into some issues with Khronos' LLVM to SPIR-V translator which seemed to indicate using byval
was recommended. It's also much easier for the front-end to be able to use this LLVM and SPIR-V feature instead of having to rewrite IR to pass arguments by value.
It is probable, that this issue occurs because the argument is not used in the kernel. @maleadt, while I'm debugging the issue in the compiler, could you please try adding some valid usage of the argument into the kernel and see if it works? Ideally, this would be e.g. a store to some buffer, so that this code is not dead-code-eliminated.
I found the issue in the compiler: it seems that only struct arguments are currently supported for byval. So for a quick workaround you could wrap the array in a struct, e.g.:
; SPIR-V
; Version: 1.0
; Generator: Khronos LLVM/SPIR-V Translator; 14
; Bound: 14
; Schema: 0
OpCapability Addresses ; 0x00000014
OpCapability Kernel ; 0x0000001c
OpCapability Int64 ; 0x00000024
%1 = OpExtInstImport "OpenCL.std" ; 0x0000002c
OpMemoryModel Physical64 OpenCL ; 0x00000040
OpEntryPoint Kernel %10 "test" ; 0x0000004c
%13 = OpString "kernel_arg_type.test.struct pj," ; 0x00000060
OpSource OpenCL_C 102000 ; 0x00000088
OpName %struct_pj "struct.pj" ; 0x00000094
OpName %arg "arg" ; 0x000000a8
OpName %entry "entry" ; 0x000000b4
OpDecorate %arg FuncParamAttr ByVal ; 0x000000c4
%uint = OpTypeInt 32 0 ; 0x000000d4
%ulong = OpTypeInt 64 0 ; 0x000000e4
%ulong_1 = OpConstant %ulong 1 ; 0x000000f4
%void = OpTypeVoid ; 0x00000108
%_arr_uint_ulong_1 = OpTypeArray %uint %ulong_1 ; 0x00000110
%struct_pj = OpTypeStruct %_arr_uint_ulong_1 ; 0x00000120
%_ptr_Function_struct_pj = OpTypePointer Function %struct_pj ; 0x0000012c
%9 = OpTypeFunction %void %_ptr_Function_struct_pj ; 0x0000013c
%10 = OpFunction %void None %9 ; 0x0000014c
%arg = OpFunctionParameter %_ptr_Function_struct_pj ; 0x00000160
%entry = OpLabel ; 0x0000016c
OpReturn ; 0x00000174
OpFunctionEnd ; 0x00000178
@maleadt, could you please let us know if it is feasible for you to implement this workaround to resolve this issue? The reason why it was not supported, is because such SPIR-V wouldn't be produced by Clang for OpenCL case. The only valid case for byval pointer value there is struct param.
Wrapping in a { ... } byval
seems to work, but has exposed other LLVM-related issues. So it's a good workaround for now, but it would still be nice to have full byval
support at some point.
I'm transferring it to IGC project: https://github.com/intel/intel-graphics-compiler
Thanks to the CPP reproducer. https://github.com/intel/intel-graphics-compiler/issues/162#issuecomment-692715577 I was able to reproduce the error on NEO release 20.34.17727 On the newest NEO release 22.23.23405 issue is not reproducible anymore. CPP reproducer does not report any error on the newest NEO release.
Could you check if the error still occurs on the newest NEO release? If not please close the issue.
Removing my workaround still results in miscompilations (tested using NEO 22.25.23529). I don't have the time to investigate further now, I'll report back when I've had a look.
Using the Khronos LLVM to SPIRV translator with:
yields:
The function parameter here is passed as a ByVal OpTypePointer Function. According to https://spec.oneapi.com/level-zero/latest/core/SPIRV.html#kernel-arguments the Function storage class is not supported, but that should not matter here as the parameter is passed ByVal (it would complicate my codegen if I have to rewrite all pointers to target a supported address space -- just attaching
byval
is much easier).This is on 20.24.17065, so https://github.com/intel/compute-runtime/blob/20.24.17065/level_zero/core/source/kernel/kernel_imp.cpp#L563
In some cases, passing such a pointer does work, so it does look like a bug. For example, passing a ByVal OpTypePointer Function that points to a struct containing a (supported) OpTypePointer CrossWorkgroup: