intel / compute-runtime

Intel® Graphics Compute Runtime for oneAPI Level Zero and OpenCL™ Driver
MIT License
1.13k stars 230 forks source link

Getting DG2 to work on Linux #559

Closed yxlao closed 1 year ago

yxlao commented 2 years ago

The issue

I am getting the following error when running SYCL programs on DG2. The same program works perfectly on the 12th gen ADL-P integrated GPU.

terminate called after throwing an instance of 'cl::sycl::runtime_error'
  what():  Native API failed. Native API returns: -997 (Command failed to enqueue/execute) -997 (Command failed to enqueue/execute)

I am wondering if some steps in my kernel config, dependency installation, or the CMake file are wrong. All the details are provided below.

System info

Hardware

I am on a laptop with i7-12700H CPU with iGPU (ID: 0x46a6) and Arc A370M discrete GPU (ID: 0x5693).

sycl-ls works:

(sycl) ➜  ~ sycl-ls
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device 1.2 [2022.13.3.0.16_160000]
[opencl:cpu:1] Intel(R) OpenCL, 12th Gen Intel(R) Core(TM) i7-12700H 3.0 [2022.13.3.0.16_160000]
[opencl:gpu:2] Intel(R) OpenCL HD Graphics, Intel(R) Graphics [0x5693] 3.0 [22.34.24023]
[opencl:gpu:3] Intel(R) OpenCL HD Graphics, Intel(R) Graphics [0x46a6] 3.0 [22.34.24023]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Graphics [0x5693] 1.3 [1.3.24023]
[ext_oneapi_level_zero:gpu:1] Intel(R) Level-Zero, Intel(R) Graphics [0x46a6] 1.3 [1.3.24023]
[host:host:0] SYCL host platform, SYCL host device 1.2 [1.2]

Software

Code and CMake

Here's my demo code:


#include <CL/sycl.hpp>
#include <cstddef>
#include <iostream>
#include <vector>

using namespace cl;

class DG2Selector : public sycl::device_selector {
public:
    int operator()(const sycl::device &dev) const override {
        const std::string expected_name = "Intel(R) Graphics [0x5693]";
        const std::string dev_name = dev.get_info<sycl::info::device::name>();
        return dev_name == expected_name ? 1 : -1;
    }
};

class ADLPSelector : public sycl::device_selector {
public:
    int operator()(const sycl::device &dev) const override {
        const std::string expected_name = "Intel(R) Graphics [0x46a6]";
        const std::string dev_name = dev.get_info<sycl::info::device::name>();
        return dev_name == expected_name ? 1 : -1;
    }
};

void RunSYCLDemo(sycl::queue &queue) {
    const size_t num_elements = 4;
    const size_t num_bytes = 4 * sizeof(float);
    std::vector<float> cpu_vector{0.0f, 1.0f, 2.0f, 3.0f};
    void *cpu_ptr = static_cast<void *>(cpu_vector.data());
    std::cout << "Inputs:" << std::endl;
    for (size_t i = 0; i < num_elements; i++) {
        std::cout << "cpu_vector[" << i << "] = " << cpu_vector[i] << std::endl;
    }

    void *sycl_ptr = sycl::malloc_device(num_bytes, queue);
    queue.memcpy(sycl_ptr, cpu_ptr, num_bytes).wait_and_throw();
    queue.submit([&](sycl::handler &h) {
             h.parallel_for(num_elements, [sycl_ptr](size_t i) {
                 float *ptr = static_cast<float *>(sycl_ptr);
                 ptr[i] *= 2.0f;
             });
         }).wait();
    queue.memcpy(cpu_ptr, sycl_ptr, num_bytes).wait_and_throw();
    if (sycl_ptr) {
        sycl::free(sycl_ptr, queue);
    }

    std::cout << "Outputs:" << std::endl;
    for (size_t i = 0; i < num_elements; i++) {
        std::cout << "cpu_vector[" << i << "] = " << cpu_vector[i] << std::endl;
    }
}

int main() {
    sycl::device device;
    sycl::queue queue;

    // Run on Alder Lake P integrated GPU (0x46a6).
    try {
        device = sycl::device(ADLPSelector());
        queue = sycl::queue(device);
        std::cout << "ADL-P integrated GPU detected, running demo" << std::endl;
    } catch (const sycl::exception &e) {
        std::cout << "ADL-P integrated GPU not found" << std::endl;
    }
    RunSYCLDemo(queue);

    std::cout << "*******************************" << std::endl;

    // Run on DG2 GPU (0x5693).
    try {
        device = sycl::device(DG2Selector());
        queue = sycl::queue(device);
        std::cout << "DG2 discrete GPU detected, running demo" << std::endl;
    } catch (const sycl::exception &e) {
        std::cout << "DG2 discrete GPU not found" << std::endl;
    }
    RunSYCLDemo(queue);

    return 0;
}

Here's the CMake file

cmake_minimum_required(VERSION 3.20)

set(CMAKE_CXX_COMPILER dpcpp)

project(SYCLCMake)

add_executable(demo main.cpp)
target_compile_options(demo PRIVATE -fsycl)
target_link_libraries(demo PRIVATE sycl)
target_link_options(demo PRIVATE -fsycl)

Here's the full commands and outputs:

(sycl) ➜  ~/repo/sycl-cmake (master) mkdir build       
(sycl) ➜  ~/repo/sycl-cmake (master) cd build
(sycl) ➜  ~/repo/sycl-cmake/build (master) cmake ..
-- The C compiler identification is GNU 11.2.0
-- The CXX compiler identification is IntelLLVM 2022.1.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/lib/ccache/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/intel/oneapi/compiler/2022.1.0/linux/bin/dpcpp - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done
-- Generating done
-- Build files have been written to: /home/yixing/repo/sycl-cmake/build
(sycl) ➜  ~/repo/sycl-cmake/build (master) make       
[ 50%] Building CXX object CMakeFiles/demo.dir/main.cpp.o
[100%] Linking CXX executable demo
[100%] Built target demo
(sycl) ➜  ~/repo/sycl-cmake/build (master) ./demo
ADL-P integrated GPU detected, running demo
Inputs:
cpu_vector[0] = 0
cpu_vector[1] = 1
cpu_vector[2] = 2
cpu_vector[3] = 3
Outputs:
cpu_vector[0] = 0
cpu_vector[1] = 2
cpu_vector[2] = 4
cpu_vector[3] = 6
*******************************
DG2 discrete GPU detected, running demo
Inputs:
cpu_vector[0] = 0
cpu_vector[1] = 1
cpu_vector[2] = 2
cpu_vector[3] = 3
terminate called after throwing an instance of 'cl::sycl::runtime_error'
  what():  Native API failed. Native API returns: -997 (Command failed to enqueue/execute) -997 (Command failed to enqueue/execute)
[1]    7890 IOT instruction  ./demo

As you can see, the program works fine on the integrated GPU, but not on the DG2 discrete GPU.

JablonskiMateusz commented 2 years ago

Hi @yxlao . Could you confirm if the issue is visible on Ubuntu 20.04?

eero-t commented 2 years ago

I have installed all the latest .deb packages from compute-runtime releases and from the GPGPU driver page.

What about the versions for rest of the related packages; llvm-spirv, opencl-clan, gmmlib?

apstasen commented 2 years ago

I can confirm the same issue on Intel Arc A380 and Ubuntu 22.04. I updated to the same kernel 6.0rc3 from https://kernel.ubuntu.com/~kernel-ppa/mainline/v6.0-rc3/amd64 and use drivers/runtime from ghcr.io/intel/llvm/sycl_ubuntu2004_nightly container. I also downloaded /usr/lib/firmware/i915/dg2_dmc_ver2_07.bin and /usr/lib/firmware/i915/dg2_guc_70.4.1.bin from https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/i915 but not sure how to make them to be loaded by current i915 driver:

$ uname -a
Linux sap 6.0.0-060000rc3-generic #202208282331 SMP PREEMPT_DYNAMIC Sun Aug 28 23:34:06 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

$ sudo dmesg | grep -i i915
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.0.0-060000rc3-generic root=UUID=00b93c5a-1c6f-4506-835c-39661ff2b7de ro quiet splash i915.force_probe=56a5 vt.handoff=7
[    0.325396] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.0.0-060000rc3-generic root=UUID=00b93c5a-1c6f-4506-835c-39661ff2b7de ro quiet splash i915.force_probe=56a5 vt.handoff=7
[    6.949811] i915 0000:03:00.0: [drm] Incompatible option enable_guc=3 - HuC is not supported!
[    6.950358] i915 0000:03:00.0: vgaarb: deactivate vga console
[    6.950382] i915 0000:03:00.0: BAR 0: releasing [mem 0xa0000000-0xa0ffffff 64bit]
[    6.950384] i915 0000:03:00.0: BAR 2: releasing [mem 0x4000000000-0x400fffffff 64bit pref]
[    6.950421] i915 0000:03:00.0: BAR 2: no space for [mem size 0x200000000 64bit pref]
[    6.950422] i915 0000:03:00.0: BAR 2: failed to assign [mem size 0x200000000 64bit pref]
[    6.950424] i915 0000:03:00.0: BAR 0: assigned [mem 0xa0000000-0xa0ffffff 64bit]
[    6.950475] i915 0000:03:00.0: [drm] Failed to resize BAR2 to 8192M (-ENOSPC)
[    6.950477] i915 0000:03:00.0: BAR 2: assigned [mem 0x4000000000-0x400fffffff 64bit pref]
[    6.950500] i915 0000:03:00.0: [drm] Local memory IO size: 0x0000000010000000
[    6.950501] i915 0000:03:00.0: [drm] Local memory available: 0x000000017c800000
[    6.950502] i915 0000:03:00.0: [drm] Using a reduced BAR size of 256MiB. Consider enabling 'Resizable BAR' or similar, if available in the BIOS.
[    6.965145] i915 0000:03:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
[    6.969685] i915 0000:03:00.0: [drm] Finished loading DMC firmware i915/dg2_dmc_ver2_06.bin (v2.6)
[    6.972580] i915 0000:03:00.0: [drm] GuC firmware i915/dg2_guc_70.1.2.bin version 70.1
[    6.982863] i915 0000:03:00.0: [drm] GuC submission enabled
[    6.982864] i915 0000:03:00.0: [drm] GuC SLPC enabled
[    6.983219] i915 0000:03:00.0: [drm] GuC RC: enabled
[    6.998933] i915 0000:03:00.0: [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS.
[    7.001015] [drm] Initialized i915 1.6.0 20201103 for 0000:03:00.0 on minor 0
[    7.001140] snd_hda_intel 0000:04:00.0: bound 0000:03:00.0 (ops i915_audio_component_bind_ops [i915])
[    7.027185] fbcon: i915drmfb (fb0) is primary device
[    7.092757] i915 0000:03:00.0: [drm] fb0: i915drmfb frame buffer device
[    7.136250] snd_hda_codec_hdmi hdaudioC0D2: No i915 binding for Intel HDMI/DP codec

$ ls -la /usr/lib/firmware/i915/dg2_*
-rw-r--r-- 1 root root   22416 Aug 31 04:11 /usr/lib/firmware/i915/dg2_dmc_ver2_06.bin
-rw-r--r-- 1 root root  152588 Sep 19 09:46 /usr/lib/firmware/i915/dg2_dmc_ver2_07.bin
-rw-r--r-- 1 root root  365568 Aug 31 04:11 /usr/lib/firmware/i915/dg2_guc_70.1.2.bin
-rw-r--r-- 1 root root 2455519 Sep 19 09:47 /usr/lib/firmware/i915/dg2_guc_70.4.1.bin
eero-t commented 2 years ago

but not sure how to make them to be loaded by current i915 driver

Specific kernel versions have always loaded specific FW version, and possibly supported loading some older version(s) for backwards compatibility [1], if one they wanted is missing. They do not load newer FW versions, as they do not know about them (what API changes they have etc). For that you would need a newer i915 module.

[1] Relevant bug: https://gitlab.freedesktop.org/drm/intel/-/issues/6895

apstasen commented 2 years ago

Same issue with https://kernel.ubuntu.com/~kernel-ppa/mainline/v6.0-rc6/amd64 kernel. Its i915 still uses old firmware btw:

[    6.999994] i915 0000:03:00.0: [drm] Finished loading DMC firmware i915/dg2_dmc_ver2_06.bin (v2.6)
[    7.003732] i915 0000:03:00.0: [drm] GuC firmware i915/dg2_guc_70.1.2.bin version 70.1

I'm not yet morally ready to build i915 myself. Still hope to get it DG2 functional version prebuilt from https://kernel.ubuntu.com/~kernel-ppa/mainline/v6.0-rc6/amd64 in some not so distant future.

apstasen commented 1 year ago

Same issue with https://kernel.ubuntu.com/~kernel-ppa/mainline/v6.0-rc7/amd64 kernel and old firmware loaded. When I moved to https://kernel.ubuntu.com/~kernel-ppa/mainline/drm-intel-next/2022-10-01/amd64 and https://kernel.ubuntu.com/~kernel-ppa/mainline/drm-intel-nightly/2022-10-01/amd64 kernels I see that new GuC firmware is loaded but the original issue still persists:

sap@sap:~$ sudo dmesg | grep -i i915
[sudo] password for sap:
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.0.0-060000rc2drmintelnext20221001-generic root=UUID=00b93c5a-1c6f-4506-835c-39661ff2b7de ro quiet splash i915.force_probe=56a5 vt.handoff=7
[    0.331090] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.0.0-060000rc2drmintelnext20221001-generic root=UUID=00b93c5a-1c6f-4506-835c-39661ff2b7de ro quiet splash i915.force_probe=56a5 vt.handoff=7
[    7.127023] i915 0000:03:00.0: [drm] Incompatible option enable_guc=3 - HuC is not supported!
[    7.127999] i915 0000:03:00.0: vgaarb: deactivate vga console
[    7.128043] i915 0000:03:00.0: BAR 0: releasing [mem 0xa0000000-0xa0ffffff 64bit]
[    7.128052] i915 0000:03:00.0: BAR 2: releasing [mem 0x4000000000-0x400fffffff 64bit pref]
[    7.128166] i915 0000:03:00.0: BAR 2: no space for [mem size 0x200000000 64bit pref]
[    7.128172] i915 0000:03:00.0: BAR 2: failed to assign [mem size 0x200000000 64bit pref]
[    7.128179] i915 0000:03:00.0: BAR 0: assigned [mem 0xa0000000-0xa0ffffff 64bit]
[    7.128339] i915 0000:03:00.0: [drm] Failed to resize BAR2 to 8192M (-ENOSPC)
[    7.128349] i915 0000:03:00.0: BAR 2: assigned [mem 0x4000000000-0x400fffffff 64bit pref]
[    7.128417] i915 0000:03:00.0: [drm] Local memory IO size: 0x0000000010000000
[    7.128424] i915 0000:03:00.0: [drm] Local memory available: 0x000000017c800000
[    7.128428] i915 0000:03:00.0: [drm] Using a reduced BAR size of 256MiB. Consider enabling 'Resizable BAR' or similar, if available in the BIOS.
[    7.148546] snd_hda_codec_hdmi hdaudioC0D2: No i915 binding for Intel HDMI/DP codec
[    7.161530] i915 0000:03:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
[    7.166043] i915 0000:03:00.0: [drm] Finished loading DMC firmware i915/dg2_dmc_ver2_07.bin (v2.7)
[    7.166706] i915 0000:03:00.0: [drm] GuC error state capture buffer maybe too small: 2097152 < 3147156 (min = 1049052)
[    7.169335] i915 0000:03:00.0: [drm] GuC firmware i915/dg2_guc_70.4.1.bin version 70.4
[    7.181260] i915 0000:03:00.0: [drm] GuC submission enabled
[    7.181262] i915 0000:03:00.0: [drm] GuC SLPC enabled
[    7.181621] i915 0000:03:00.0: [drm] GuC RC: enabled
[    7.199319] i915 0000:03:00.0: [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS.
[    7.209966] [drm] Initialized i915 1.6.0 20201103 for 0000:03:00.0 on minor 0
[    7.210168] snd_hda_intel 0000:04:00.0: bound 0000:03:00.0 (ops i915_audio_component_bind_ops [i915])
[    7.236070] fbcon: i915drmfb (fb0) is primary device
[    7.313730] i915 0000:03:00.0: [drm] fb0: i915drmfb frame buffer device
apstasen commented 1 year ago

For reference I use ghcr.io/intel/llvm/sycl_ubuntu2004_nightly docker container that currently have this runtime installed:

root@d692967a0182:~/sycl# dpkg --list | grep intel
ii  intel-igc-cm                1.0.119                           amd64        The Intel(R) C for Metal compiler is a open source compiler that implements C for Metal programming language. C for Metal is a new GPU kernel programming language for Intel HD Graphics.
ii  intel-igc-core              1.0.12149.1                       amd64        Intel(R) Graphics Compiler for OpenCL(TM)
ii  intel-igc-media             1.0.12149.1                       amd64        Intel(R) Graphics Compiler for OpenCL(TM)
ii  intel-igc-opencl            1.0.12149.1                       amd64        Intel(R) Graphics Compiler for OpenCL(TM)
ii  intel-igc-opencl-devel      1.0.12149.1                       amd64        Intel(R) Graphics Compiler for OpenCL(TM)
ii  intel-level-zero-gpu        1.3.24278                         amd64        Intel(R) Graphics Compute Runtime for oneAPI Level Zero.
ii  intel-media-va-driver:amd64 20.1.1+dfsg1-1                    amd64        VAAPI driver for the Intel GEN8+ Graphics family
ii  intel-opencl-icd            22.38.24278                       amd64        Intel graphics compute runtime for OpenCL
ii  libdrm-intel1:amd64         2.4.107-8ubuntu1~20.04.2          amd64        Userspace interface to intel-specific kernel DRM services -- runtime
alheinecke commented 1 year ago

I have similar problems on Fedora 36, with Fedora built 6.0.0-54 kernel, but using OpenCL.

clQuery runs perfectly, but clpeak just hangs. I was not able to build igc from sources, due to this issue: https://github.com/intel/intel-graphics-compiler/issues/259 . However NEO was build from source as you can see due to version 22.41.0.

[xxx@yyy build]$ ./clpeak -p 0 -d 0

Platform: Intel(R) OpenCL HD Graphics
  Device: Intel(R) Graphics [0x5690]
    Driver version  : 22.41.0 (Linux x64)
    Compute units   : 512
    Clock frequency : 2050 MHz

    Global memory bandwidth (GBPS)

When using the ADL-P GPU, everything works fine:

[xxx@yyybuild]$ ./clpeak -p 1 -d 0

Platform: Intel(R) OpenCL HD Graphics
  Device: Intel(R) Graphics [0x46a6]
    Driver version  : 22.41.0 (Linux x64)
    Compute units   : 96
    Clock frequency : 1400 MHz

    Global memory bandwidth (GBPS)
      float   : 44.05
      float2  : 43.89
      float4  : 45.96
      float8  : 45.61
      float16 : 46.42

    Single-precision compute (GFLOPS)
      float   : 2097.01
      float2  : 2072.03
      float4  : 2087.24
      float8  : 1325.47
      float16 : 1316.19

    Half-precision compute (GFLOPS)
      half   : 4066.52
      half2  : 4029.82
      half4  : 4067.22
      half8  : 4042.22
      half16 : 3631.05

    No double precision support! Skipped

    Integer compute (GIOPS)
      int   : 701.07
      int2  : 477.28
      int4  : 455.85
      int8  : 433.37
      int16 : 526.13

    Integer compute Fast 24bit (GIOPS)
      int   : 696.82
      int2  : 477.27
      int4  : 455.84
      int8  : 433.37
      int16 : 526.18

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer              : 21.41
      enqueueReadBuffer               : 22.32
      enqueueWriteBuffer non-blocking : 21.69
      enqueueReadBuffer non-blocking  : 21.96
      enqueueMapBuffer(for read)      : 1651907.50
        memcpy from mapped ptr        : 21.87
      enqueueUnmap(after write)       : 21474796.00
        memcpy to mapped ptr          : 21.44

    Kernel launch latency : 53.55 us
telmin commented 1 year ago

I have similar problem as @alheinecke. Getting device infomation(like clinfo) is running perfectly. However, it hangs when an instruction is submitted to the command queue.

I am using Fedora36 and Kernel 6.0.0-0.rc6. The libraries I am using are IGC: ea9ac563 gmmlib 4552654e neo: 06817090

IGC and dependent libraries were also built from source using build_ubuntu.md as a reference.

When run clpeak, it hangs below.

Platform: Intel(R) OpenCL HD Graphics
  Device: Intel(R) Graphics [0x56a5].
    Driver version : 22.41.0 (Linux x64)
    Compute units : 128
    Clock frequency : 2450 MHz

    Global memory bandwidth (GBPS)

backtrace is below

#0  0x00007fcd36c7b53b in sched_yield () at ../sysdeps/unix/syscall-template.S:120
#1  0x00007fcd36619b86 in __gthread_yield () at /usr/include/c++/12/x86_64-redhat-linux/bits/gthr-default.h:693
#2  std::this_thread::yield () at /usr/include/c++/12/bits/std_thread.h:322
#3  NEO::WaitUtils::waitFunctionWithPredicate<unsigned int>(unsigned int const volatile*, unsigned int, std::function<bool (unsigned int, unsigned int)>) (predicate=..., expectedValue=1, pollAddress=0x7fcd370c4000) at /tmp/work/neo/shared/source/utilities/wait_util.h:32
#4  NEO::WaitUtils::waitFunction (expectedValue=1, pollAddress=0x7fcd370c4000) at /tmp/work/neo/shared/source/utilities/wait_util.h:37
#5  NEO::CommandStreamReceiver::baseWaitFunction (this=0x1a9ac10, pollAddress=0x7fcd370c4000, params=..., taskCountToWait=1) at /tmp/work/neo/shared/source/command_stream/command_stream_receiver.cpp:387
#6  0x00007fcd3653b6aa in NEO::CommandStreamReceiverHw<NEO::XeHpgCoreFamily>::waitForTaskCountWithKmdNotifyFallback (this=0x1a9ac10, taskCountToWait=1, flushStampToWait=<optimized out>, useQuickKmdSleep=<optimized out>, throttle=NEO::MEDIUM)
    at /tmp/work/neo/shared/source/command_stream/command_stream_receiver_hw_base.inl:905
#7  0x00007fcd360f8606 in NEO::CommandQueue::waitUntilComplete (this=this@entry=0x1c258f0, gpgpuTaskCountToWait=0, copyEnginesToWait=..., flushStampToWait=0, useQuickKmdSleep=useQuickKmdSleep@entry=false, cleanTemporaryAllocationList=true, skipWait=false)
    at /tmp/work/neo/opencl/source/command_queue/command_queue.cpp:418
#8  0x00007fcd360f972b in NEO::CommandQueue::waitForAllEngines (this=this@entry=0x1c258f0, blockedQueue=<optimized out>, printfHandler=printfHandler@entry=0x0, cleanTemporaryAllocationsList=cleanTemporaryAllocationsList@entry=true) at /tmp/work/neo/opencl/source/command_queue/command_queue.cpp:1222
#9  0x00007fcd363c6076 in NEO::CommandQueue::waitForAllEngines (printfHandler=0x0, blockedQueue=<optimized out>, this=0x1c258f0) at /tmp/work/neo/opencl/source/command_queue/command_queue.h:217
#10 NEO::CommandQueueHw<NEO::XeHpgCoreFamily>::enqueueBlit<4596u> (this=this@entry=0x1c258f0, multiDispatchInfo=..., numEventsInWaitList=numEventsInWaitList@entry=0, eventWaitList=eventWaitList@entry=0x0, event=<optimized out>, blocking=<optimized out>, bcsCsr=...)
    at /tmp/work/neo/opencl/source/command_queue/enqueue_common.h:1318
#11 0x00007fcd363ccbfa in NEO::CommandQueueHw<NEO::XeHpgCoreFamily>::dispatchBcsOrGpgpuEnqueue<4596u, 2ul> (this=this@entry=0x1c258f0, dispatchInfo=..., surfaces=..., builtInOperation=builtInOperation@entry=1, numEventsInWaitList=numEventsInWaitList@entry=0, eventWaitList=eventWaitList@entry=0x0, event=<optimized out>,
    blocking=true, csr=...) at /tmp/work/neo/opencl/source/command_queue/enqueue_common.h:1338
#12 0x00007fcd363cd2f2 in NEO::CommandQueueHw<NEO::XeHpgCoreFamily>::enqueueWriteBuffer (this=0x1c258f0, buffer=0x1ba8be0, blockingWrite=1, offset=0, size=<optimized out>, ptr=<optimized out>, mapAllocation=<optimized out>, numEventsInWaitList=0, eventWaitList=0x0, event=0x0)
    at /tmp/work/neo/opencl/source/command_queue/enqueue_write_buffer.h:104
#13 0x00007fcd360c89e7 in clEnqueueWriteBuffer (commandQueue=<optimized out>, buffer=<optimized out>, blockingWrite=<optimized out>, offset=<optimized out>, cb=<optimized out>, ptr=<optimized out>, numEventsInWaitList=<optimized out>, eventWaitList=<optimized out>, event=<optimized out>)
    at /tmp/work/neo/opencl/source/api/api.cpp:2493
#14 0x00007fcd370e17e0 in clEnqueueWriteBuffer (command_queue=0x1c25900, buffer=0x1ba8bf0, blocking_write=1, offset=0, cb=1515978752, ptr=<optimized out>, num_events_in_wait_list=0, event_wait_list=0x0, event=0x0) at ocl_icd_loader_gen.c:2312
#15 0x0000000000410c47 in clPeak::runGlobalBandwidthTest(cl::CommandQueue&, cl::Program&, device_info_t&) ()
#16 0x000000000040ab25 in clPeak::runAll() ()
#17 0x0000000000407b5d in main ()

it seems no response from device.

eero-t commented 1 year ago

Discrete GPU support is not yet complete in upstream kernel: https://www.kernel.org/doc/html/latest/gpu/rfc/index.html

If you're using (or can use) one of the Intel GPU DKMS supported distro / kernel versions, you could try the DKMS instead:

alheinecke commented 1 year ago

yeah... but I saw multiple reports that it should work and with force_probe it at least loads. I tried that RHEL 8.5 recipe earlier this weeks, it works somehow but the system starts to hang, even clinfo shows a Floating Point Exception when existing... so this doesn't seem to be stable either... granted, it works somehow more :-)

do you know of linux 6.1.0 will have full support?

tjaalton commented 1 year ago

6.1 hasn't queued a commit to drop force_probe, so I don't think it'll support DG2 OOTB

alheinecke commented 1 year ago

I don't mind exporting force_probe for now, I mind that with even force_probe nothing works

eero-t commented 1 year ago

I don't mind exporting force_probe for now, I mind that with even force_probe nothing works

Needing to use force_probe means that HW is only partially supported by given (upstream) kernel. For now, I would be more worried about things not working properly with a backport kernel module, when it exposes the device without forcing.

I tried that RHEL 8.5 recipe earlier this weeks, it works somehow but the system starts to hang, even clinfo shows a Floating Point Exception when existing... so this doesn't seem to be stable either...

Did you build kernel module from the backports git repo [1], or install ready-built DKMS kernel package from: https://dgpu-docs.intel.com/installation-guides/index.html

I'm not sure whether each of the backport kernel modules (for different distro kernel versions[2]) is tested as much, but at least ones for newer kernel versions would be closer to what eventually gets upstreamed (i.e. potentially something the kernel driver developers work more with), and what I would imagine compute-runtime to be tested with.

Btw. @JablonskiMateusz, which of backport kernel module versions[2] is most tested with compute-runtime?

And is GPU module backport enough, or would the other backport module(s) [3] also be needed for OpenCL?

(E.g. compute-runtime Level-Zero Sysman parts needs also telemetry module to provide all GPU metrics.)

[1] https://github.com/intel-gpu/intel-gpu-i915-backports

[2] Kernel versions supported currently by the backport modules are:

[2] https://github.com/intel-gpu/intel-gpu-i915-backports#dependencies

alheinecke commented 1 year ago

I followed the steps here: https://dgpu-docs.intel.com/installation-guides/index.html and I'm using CentOS 8.6... However, I just checked I'm running "4.18.0-372.26.1.el8_6.x86_64" and back-ports only mentions "4.18.0-372.26.1.el8_6.x86_64"...

apstasen commented 1 year ago

@eero-t Do you know if https://kernel.ubuntu.com/~kernel-ppa/mainline/drm-intel-nightly kernels are supposed to work for DG2 as well?

telmin commented 1 year ago

@eero-t I followed the steps in that document and it works. Thank you very much.

tjaalton commented 1 year ago

@eero-t Do you know if https://kernel.ubuntu.com/~kernel-ppa/mainline/drm-intel-nightly kernels are supposed to work for DG2 as well?

That's closer to what'll be in 6.2, so it's better but still needs force_probe. Note that there's no 'nightly' branch upstream anymore, this is an alias to drm-tip.

eero-t commented 1 year ago

I followed the steps here: https://dgpu-docs.intel.com/installation-guides/index.html and I'm using CentOS 8.6... However, I just checked I'm running "4.18.0-372.26.1.el8_6.x86_64" and back-ports only mentions "4.18.0-372.26.1.el8_6.x86_64"...

Um. Those numbers match?

Anyway, at least as long as kernel major & major numbers match it should be fine. Small minor version differences may also be fine, but that depends completely on what changes kernel did. DKMS package (at least on Debian) builds kernel module from sources against the kernel headers, whenever DKMS package installed, or new kernel is installed, so you would at least notice internal kernel API mismatch (from DKMS module build failing), but not semantic changes.

@eero-t Do you know if https://kernel.ubuntu.com/~kernel-ppa/mainline/drm-intel-nightly kernels are supposed to work for DG2 as well?

I would assume backports DKMS to work better, at least until drm-tip (which regularly rebases to upstream kernel) does not need force-probing any more.

apstasen commented 1 year ago

https://dgpu-docs.intel.com/installation-guides/ubuntu/ubuntu-jammy-arc.html#step-1-add-package-repository BKM worked for me for Ubuntu 22.04. Thank you! I wounder why I not found it earlier (maybe it was only recently updated to include Ubuntu 22.04).

eero-t commented 1 year ago

Kernel packages + instructions were added to dgpu site only few weeks ago, before that there were only user space driver packages. Kernel driver backports Git repository has been there a a bit longer.

Note: At least in case of media driver, there's a build time option to select between "production" (non-upstream) i915 GPU driver kernel uAPI, and upstream i915 uAPI, and those are not fully compatible (upstreaming often requires API changes before being accepted, and as I mentioned above, upstreaming of kernel dGPU support is not complete yet). I.e. it's better to install kernel and user-space GPU driver packages from the same repo, to make sure they are compatible.