Building a simple SYCL project with CMake results in a "No kernel named _(insert mangled name here)_ was found -46 (PI_ERROR_INVALID_KERNEL_NAME)" error.

jonathan-ramsey commented 1 year ago

When trying to build any SYCL-enabled project with CMake and the compiler built from the sycl branch of intel/llvm (see below), the successfully compiled binary kernel throws an error along the lines of "No kernel named (mangled name) was found -46 (PI_ERROR_INVALID_KERNEL_NAME)" while trying to execute any SYCL kernel.

I can get the error to arise even with the vector-add-buffers example in the oneAPI-samples repository.

However, if I use an installed copy of oneAPI Base toolkit (version 2023.2.1), then the example (and indeed other SYCL-enabled projects) work fine!

After digging around, the consensus is that I should be using the compiler (clang-cl.exe) to link, while the default build process uses lld-link.exe. Okay, so I did that and used clang-cl.exe to link with the appropriate options instead, but it does not fix the issue.

After carefully comparing the build process for the oneAPI toolkit versus the self-built clang/llvm from this repo, it would seem that the linking step of the oneAPI toolkit build (which is using icx.exe) is doing many more things than when I link using the self-built clang-cl.exe or lld-link.exe (e.g. icx.exe makes multiple calls to clang-offload-builder).

Another big indicator that things are different is that the oneAPI toolkit built executable is almost 4 times larger than the executable from the self-built clang compiler.

Important note: In the case of the vector-add-buffers oneAPI sample, if I forego CMake and compile just the single source code file in a one-liner (e.g. clang-cl -fsycl /EHsc vector-add-buffers.cpp -o vector-add-buffers.exe), then it does work as expected. Can anyone tell me what are the additional steps I need to take to get the linking step of the self-built clang to behave like the oneAPI toolkit?

An alternative way to phrase this might be, how do I deploy the self-built clang/DPC++ toolchain on my local system?

Environment:

OS: Windows 10 Pro
Target device: Any device I have it seems, but let's stick with an Intel Core i7-10875H for simplicity.
DPC++ version: clang version 18.0.0, commit 47083f847f5059c435ec7dece4bb633f78e2946f (tag: nightly-2023-09-28)
clang was built locally with no additional options to configure.py or compile.py following the Getting Started instructions.
Using set ONEAPI_DEVICE_SELECTOR=opencl:cpu to force calculation on the CPU. CMakeLists.txt

P.S. If you're wondering why I'd want to use SYCL to target CPUs, it is only a stepping stone to NVIDIA GPU offloading...once I can get things working...

P.P.S. I have tried other commits, including a nightly from late last week, and one from the end of June matching the last release date of the oneAPI toolkit, but the problem persists.

Thank you in advance for any help or suggestions you can give!

My slightly modified vector-add-buffers.cpp and corresponding CMakelists.txt are attached. vector-add-buffers.txt CMakeLists.txt

jonathan-ramsey commented 1 year ago

Update: I got it to work (finally!). I am posting my solution here in the hope it will help others.

1) I modified the CMAKE_CXX_LINK_EXECUTABLE variable in my CMakeLists.txt, inspired by what oneAPI/icx.exe does as well as this issue on the CMake gitlab: https://gitlab.kitware.com/cmake/cmake/-/issues/24243

if (CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
  message(STATUS "Resetting linking command...")
  set(CMAKE_CXX_LINK_EXECUTABLE "${_CMAKE_VS_LINK_EXE}<CMAKE_CXX_COMPILER> ${CMAKE_CL_NOLOGO} 
    <CMAKE_CXX_LINK_FLAGS> <OBJECTS> ${CMAKE_START_TEMP_FILE} /link <LINK_FLAGS> <LINK_LIBRARIES>
    /out:<TARGET> /pdb:<TARGET_PDB> /version:<TARGET_VERSION_MAJOR>.<TARGET_VERSION_MINOR>
    ${_PLATFORM_LINK_FLAGS} ${CMAKE_END_TEMP_FILE}")
endif()

This doesn't actually seem to be a bug in the intel/llvm code, but rather that CMake detects the compiler as vanilla Clang rather than Clang w/SYCL. When using the oneAPI toolkit, CMake recognizes Intel/LLVM and does the right thing. I am running CMake 3.27.
- I did attempt to create a custom module + platform for CMake, but was unable to get that to work entirely.

2) Whereas icx.exe tolerates mixing of compiler flags and linker flags (e.g. /Qoption,link,/machine:x64 -fsycl), the self-built intel/llvm Clang does NOT. Anything appearing after /link is passed to the linker. As such, I had to modify the CMAKE_CXX_LINK_FLAGS variable to ensure clang-cl.exe knew I wanted to link a SYCL program rather than just trying to pass the options to the linker (which were ignored!; see below). For example:

`set(CMAKE_CXX_LINK_FLAGS "-fsycl -fsycl-targets=nvptx64-nvidia-cuda,spir64-unknown-unknown")`

I tried doing this in a more modern CMake fashion (e.g. using target_link_options) but that doesn't put the options in the right place.

Heads up: Using Ninja as the build generator for CMake generally causes warnings (but not errors) at link time to be suppressed, which was 80% of my problem. This appears to be a known issue: https://github.com/ninja-build/ninja/issues/1537

For example, using CMake+Ninja (cmake --build . --clean-first --config Debug -v) to build the vector-add-buffers sample from the oneAPI-samples repository prints out the following without errors or warnings during the link step:

> [2/2]"cmd.exe /C "cd . && "C:\Program Files\CMake\bin\cmake.exe" -E vs_link_exe --intdir=CMakeFiles\vector-add-buffers.dir\Debug
 --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100220~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100220~1.0\x64\mt.exe 
--manifests  -- C:\dev\sycl_workspace\llvm\build\bin\lld-link.exe /nologo CMakeFiles\vector-add-buffers.dir\Debug\src\vector-add-buffers.cpp.obj
 /out:Debug\vector-add-buffers.exe /implib:Debug\vector-add-buffers.lib /pdb:Debug\vector-add-buffers.pdb /version:0.0
  -g /machine:x64 /debug /INCREMENTAL /subsystem:console   -fsycl -fsycl-targets=nvptx64-nvidia-cuda,spir64-unknown-unknown
 -LIBPATH:C:\dev\sycl_workspace\llvm\build\lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib
 uuid.lib comdlg32.lib advapi32.lib && cd ."

If one runs the linking step directly, i.e.:

> C:\dev\sycl_workspace\llvm\build\bin\lld-link.exe /nologo CMakeFiles\vector-add-buffers.dir\Debug\src\vector-add-buffers.cpp.obj
 /out:Debug\vector-add-buffers.exe /implib:Debug\vector-add-buffers.lib /pdb:Debug\vector-add-buffers.pdb /version:0.0
 -g /machine:x64 /debug /INCREMENTAL /subsystem:console   -fsycl -fsycl-targets=nvptx64-nvidia-cuda,spir64-unknown-unknown
 -LIBPATH:C:\dev\sycl_workspace\llvm\build\lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib
 uuid.lib comdlg32.lib advapi32.lib

then the problem is much more apparent:

lld-link: warning: ignoring unknown argument '-g'
lld-link: warning: ignoring unknown argument '-fsycl'
lld-link: warning: ignoring unknown argument '-fsycl-targets=nvptx64-nvidia-cuda,spir64-unknown-unknown'

My workaround: Adding /WX to either CMAKE_CXX_LINK_FLAGS or target_link_options causes warnings about ignored options to be treated as errors.

maarquitos14 commented 11 months ago

@jonathan-ramsey I'm glad this is working for you now. If I understand correctly, this is not really a bug in intel/llvm, but a few other issues (CMake, Ninja) stacked together. Did I understand correctly? If so, are you okay with closing the issue?

jonathan-ramsey commented 11 months ago

@maarquitos14: It is not a bug in intel/llvm itself. However, one might argue that it should be documented for the next person who tries to use intel/llvm on Windows with CUDA, so then perhaps a request for enhancement? 😄 Otherwise, I am okay with closing the issue.

maarquitos14 commented 11 months ago

@jonathan-ramsey thanks for the quick reply and sorry for the delay. I have been trying to reproduce your issue but I couldn't. Could you please summarize the commands to reproduce the issue? Thank you in advance.

jonathan-ramsey commented 11 months ago

@maarquitos14: Sorry for the delay in replying!

Included are copies of CMakeLists.txt and src\vector-add-buffers.cpp for which I can reproduce the error on my local machine. The vector-add-buffers.cpp file should be in a directory called src inside whichever directory the CMakeLists.txt resides in.

Building: mkdir build cd build cmake -G"Ninja" .. --fresh -DCMAKE_CXX_COMPILER=clang.cl.exe cmake --build . --clean-first --config Debug -v set ONEAPI_DEVICE_SELECTOR=opencl:cpu

If everything built okay, then (in the build directory) execute vector-add-buffers.exe.

If I comment out line 8 of CMakeLists.txt, then I get the "No kernel named ... found -46 (PI_ERROR_INVALID_KERNEL_NAME)" error at runtime.

If I include line 8 of the CMakeLists.txt, and do a clean build, the test code runs successfully without error and gives the expected output.

Note that I was (and still am) using the "nightly-2023-09-28" build (47083f847f) of the sycl branch of the intel/llvm repository, and I have built the compiler to include NVIDIA CUDA support.

I am running Windows 10 Pro with Visual Studio Professional 2022.

Please let me know how it goes. Cheers.

vector-add-buffers.txt CMakeLists.txt

intel / llvm

Building a simple SYCL project with CMake results in a "No kernel named _(insert mangled name here)_ was found -46 (PI_ERROR_INVALID_KERNEL_NAME)" error. #11568