triSYCL / sycl

SYCL for Vitis: Experimental fusion of triSYCL with Intel SYCL oneAPI DPC++ up-streaming effort into Clang/LLVM
Other
107 stars 19 forks source link

sycl-ls crashes with XRT in emulation mode #228

Closed keryell closed 1 year ago

keryell commented 1 year ago

On xsjsycl41 with

Work on real hardware with environment set-up from https://github.com/triSYCL/sycl/blob/sycl/unified/next/sycl/doc/GettingStartedXilinxFPGA.md#Usage :

rkeryell@xsjsycl41:/var/tmp/rkeryell/SYCL/XRT (master)$ sycl-ls
[opencl:cpu:0] Intel(R) OpenCL, Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz 3.0 [2022.14.8.0.04_160000]
[opencl:acc:1] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device 1.2 [2022.14.8.0.04_160000]
[opencl:acc:2] Xilinx, xilinx_u200_gen3x16_xdma_base_1 1.0 [1.0]
[xrt:acc:0] Xilinx XRT, xilinx_u200_gen3x16_xdma_base_1 0.0 [2.16.0]

But not on emulation:

rkeryell@xsjsycl41:/var/tmp/rkeryell/SYCL/llvm (sycl/unified/next)$ XCL_EMULATION_MODE=hw_emu gdb sycl-ls
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from sycl-ls...
(No debugging symbols found in sycl-ls)
(gdb) r
Starting program: /var/tmp/rkeryell/SYCL/llvm/build/bin/sycl-ls 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[Detaching after fork from child process 264362]
[Detaching after fork from child process 264364]
[Detaching after fork from child process 264366]
[Detaching after fork from child process 264368]
[Detaching after fork from child process 264370]
[Detaching after fork from child process 264372]
[Detaching after fork from child process 264374]
[Detaching after fork from child process 264376]
[Detaching after fork from child process 264378]
[Detaching after vfork from child process 264380]
[Detaching after fork from child process 264382]
[Detaching after fork from child process 264384]
[Detaching after fork from child process 264386]
[Detaching after fork from child process 264388]
[Detaching after fork from child process 264390]
[Detaching after fork from child process 264392]
[Detaching after fork from child process 264394]
[Detaching after fork from child process 264396]
[Detaching after fork from child process 264398]
[opencl:cpu:0] Intel(R) OpenCL, Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz 3.0 [2022.14.8.0.04_160000]
[opencl:acc:1] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device 1.2 [2022.14.8.0.04_160000]
[opencl:acc:2] Xilinx, xilinx_u200_gen3x16_xdma_1_202110_1 1.0 [1.0]

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff316b3eb in xrt_core::query::rom_vbnv::result_type xrt_core::device_query<xrt_core::query::rom_vbnv>(xrt_core::device const*) () from /opt/xilinx/xrt/lib/libxrt_coreutil.so.2
(gdb) bt
#0  0x00007ffff316b3eb in xrt_core::query::rom_vbnv::result_type xrt_core::device_query<xrt_core::query::rom_vbnv>(xrt_core::device const*) () from /opt/xilinx/xrt/lib/libxrt_coreutil.so.2
#1  0x00007ffff3167255 in std::any (anonymous namespace)::query::get_info<std::any>(xrt_core::device const*, xrt::info::device, xrt::detail::abi const&) [clone .isra.0] () from /opt/xilinx/xrt/lib/libxrt_coreutil.so.2
#2  0x00007ffff3167885 in xrt::device::get_info_std(xrt::info::device, xrt::detail::abi const&) const () from /opt/xilinx/xrt/lib/libxrt_coreutil.so.2
#3  0x00007ffff5755220 in xrt_piDeviceGetInfo(_pi_device*, _pi_device_info, unsigned long, void*, unsigned long*) () from /var/tmp/rkeryell/SYCL/llvm/build/lib/libpi_xrt.so
#4  0x00007ffff5755528 in xrt_pi_call_wrapper<_pi_result (*)(_pi_device*, _pi_device_info, unsigned long, void*, unsigned long*), &(xrt_piDeviceGetInfo(_pi_device*, _pi_device_info, unsigned long, void*, unsigned long*))>::call(_pi_device*, _pi_device_info, unsigned long, void*, unsigned long*) () from /var/tmp/rkeryell/SYCL/llvm/build/lib/libpi_xrt.so
#5  0x00007ffff7cbd45e in sycl::_V1::detail::get_device_info_string[abi:cxx11](_pi_device*, _pi_device_info, sycl::_V1::detail::plugin const&) () from /var/tmp/rkeryell/SYCL/llvm/build/lib/libsycl.so.6
#6  0x00007ffff7efc63f in sycl::_V1::detail::is_device_info_desc<sycl::_V1::info::device::name>::return_type sycl::_V1::device::get_info<sycl::_V1::info::device::name>() const () from /var/tmp/rkeryell/SYCL/llvm/build/lib/libsycl.so.6
#7  0x0000555555557194 in main ()
(gdb) quit
A debugging session is active.

    Inferior 1 [process 264359] will be killed.

Quit anyway? (y or n) y
rkeryell@xsjsycl41:/var/tmp/rkeryell/SYCL/llvm (sycl/

I suspect some query not handled by the emulation. To report to XRT if it is the case.

keryell commented 1 year ago

The problem happens actually with the OpenCL plugin. Since we do not use it anymore, I suggest leaving this as is and for example just disable OpenCL support with

sudo mv /etc/OpenCL/vendors/xilinx.icd /etc/OpenCL/vendors/xilinx.icd.bak

I have documented it in https://github.com/triSYCL/sycl/blob/sycl/unified/next/sycl/doc/GettingStartedXilinxFPGA.md#picking-the-right-device about https://github.com/Xilinx/XRT/issues/7226 but I forget about it every time I reinstall XRT. :-(

keryell commented 1 year ago

Let's keep this as XRT technical debt.