oneapi-src / oneAPI-samples

Samples for Intel® oneAPI Toolkits
https://oneapi-src.github.io/oneAPI-samples/
MIT License
952 stars 693 forks source link

gdb-oneapi unable to stop inside GPU kernel (array-transform) for debugging #2528

Closed qiyuangong closed 3 weeks ago

qiyuangong commented 3 weeks ago

Example: https://github.com/oneapi-src/oneAPI-samples/tree/main/Tools/ApplicationDebugger/array-transform#example-outputs

HW: ARC A770 OS: Ubuntu 22.04 Kernel: 6.5

Env (oneAPI 2024.0 /opt/intel/oneapi/compiler/2024.0/bin/icpx)

(base) arda@arda-arc15:~$ sycl-ls
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2023.16.11.0.22_160000]
[opencl:cpu:1] Intel(R) OpenCL, 13th Gen Intel(R) Core(TM) i9-13900K OpenCL 3.0 (Build 0) [2023.16.11.0.22_160000]
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics OpenCL 3.0 NEO  [23.30.26918.50]
[opencl:gpu:3] Intel(R) OpenCL Graphics, Intel(R) UHD Graphics 770 OpenCL 3.0 NEO  [23.30.26918.50]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.28202]
[ext_oneapi_level_zero:gpu:1] Intel(R) Level-Zero, Intel(R) UHD Graphics 770 1.3 [1.3.28202]

dpkg -l | grep level-zero
ii  intel-level-zero-gpu                            1.3.28202.52-821~22.04                  amd64        Intel(R) Graphics Compute Runtime for oneAPI Level Zero.
ii  level-zero                                      1.13.1-719~22.04                        amd64        Intel(R) Graphics Compute Runtime for oneAPI Level Zero.
ii  level-zero-dev                                  1.16.15-821~22.04                       amd64        Intel(R) Graphics Compute Runtime for oneAPI Level Zero.

array-transform is built with debug flag using cmake provided by example.

mkdir build && cd build
cmake ..
make

gdb just skipped breakpoint in https://github.com/oneapi-src/oneAPI-samples/blob/main/Tools/ApplicationDebugger/array-transform/src/array-transform.cpp#L54. But, result is successful.

Output

gdb-oneapi array-transform
GNU gdb (Intel(R) Distribution for GDB* 2024.0.1) 13.1
Copyright (C) 2024 Free Software Foundation, Inc.; (C) 2024 Intel Corp.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.

For information about how to find Technical Support, Product Updates,
User Forums, FAQs, tips and tricks, and other support information, please visit:
<http://www.intel.com/software/products/support/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from array-transform...
(gdb) b 54
Breakpoint 1 at 0x406f53: file /home/arda/qiyuan/oneapi_debug/src/array-transform.cpp, line 54.
(gdb) r
Starting program: /home/arda/qiyuan/oneapi_debug/build/array-transform
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffdbfff640 (LWP 20045)]
[New Thread 0x7fffd95ff640 (LWP 20046)]
[New Thread 0x7fffd8dfe640 (LWP 20047)]
[Thread 0x7fffd8dfe640 (LWP 20047) exited]
[Thread 0x7fffd95ff640 (LWP 20046) exited]
warning: Temporarily disabling breakpoints for unloaded shared library "/lib/x86_64-linux-gnu/libze_intel_gpu.so.1"
[SYCL] Using device: [Intel(R) Arc(TM) A770 Graphics] from [Intel(R) OpenCL Graphics]
success; result is correct.
[Thread 0x7fffdbfff640 (LWP 20045) exited]
[Inferior 1 (process 20042) exited normally]
qiyuangong commented 3 weeks ago

The root cause is GPU driver is not correctly installed.

(qiyuan-flash) arda@arda-arc15:~/qiyuan/oneapi_debug/build$ python /opt/intel/oneapi/diagnostics/latest/bin/diagnostics.py  --select debugger_sys_check --force -v
Some checks have dependencies that are not on the list of checks to run.
The following checks will run, but will not be displayed: base_system_check

Checks results:

=======================================================================================================================================================================================
Check name: debugger_sys_check
Description: This check verifies if the environment is ready to use gdb (Intel(R) Distribution for GDB*).
=======================================================================================================================================================================================

|  Linux kernel version--------Supported---------------------------------------------------------------------------------------------------------------------------------------PASS   |
|  Debugger exist--------------Found-------------------------------------------------------------------------------------------------------------------------------------------PASS   |
|  Message: Debugger found.                                                                                                                                                           |
|  libipt exist----------------Found-------------------------------------------------------------------------------------------------------------------------------------------PASS   |
|  Message: libipt found.                                                                                                                                                             |
|  libiga exist----------------Found-------------------------------------------------------------------------------------------------------------------------------------------PASS   |
|  Message: libiga found.                                                                                                                                                             |
|  Compiler--------------------Compiler----------------------------------------------------------------------------------------------------------------------------------------PASS   |
|  i915 debug------------------i915 debug--------------------------------------------------------------------------------------------------------------------------------------ERROR  |
|  Message: No devices found that support debugging of GPU offload code.                                                                                                              |
|  How to fix: The developer of the check did not provide information on how to solve the problem. To see the solution to the problem, ask the developer of the check to              |
|  fill in the "HowToFix" field.                                                                                                                                                      |
|  Env variables---------------Env variables-----------------------------------------------------------------------------------------------------------------------------------PASS   |
|  Message: Environmental variables correct.                                                                                                                                          |
|  Gdb processes---------------Gdb processes-----------------------------------------------------------------------------------------------------------------------------------PASS   |

Output directory: /home/arda/intel/diagnostics/logs
        Text report: diagnostics_select_debugger_sys_check_force_verbosity_0_arda-arc15_20241022_085809965954.txt
        JSON report: diagnostics_select_debugger_sys_check_force_verbosity_0_arda-arc15_20241022_085809965958.json

This report for arda-arc15
was generated by the Diagnostics Utility for Intel® oneAPI Toolkits 2024.0.0.

After re-install correct GPU driver. gdb-oneAPI can stop at correct line in sycl kernel.