intel / compute-runtime

Intel® Graphics Compute Runtime for oneAPI Level Zero and OpenCL™ Driver
MIT License
1.13k stars 232 forks source link

GPU not detected #698

Open pvelesko opened 8 months ago

pvelesko commented 8 months ago

Trying to run the public runtime on PVC - from my understanding this should be supported but devices fail to initialize.

Is there a debug procedure for this?

JablonskiMateusz commented 8 months ago

could you provide more details about the setup? What device, what linux kernel, what driver version, where taken from?

pvelesko commented 8 months ago

@JablonskiMateusz this is on Aurora - I'm waiting on confirmation on what info I can post publicly

eero-t commented 8 months ago

PVC support is not yet in upstream kernel, so you would need to use i915 backport DKMS from Intel driver repos. Instructions for installing relevant kernel and user-space driver packages are here: https://dgpu-docs.intel.com/driver/installation.html

Backport DKMS kernel driver supports only specific kernel versions. Does sudo dmesg | grep i915 report kernel to have recognized PVC correctly?

JablonskiMateusz commented 8 months ago

@pvelesko do you build driver packages on your own or you take packages from our github?

pvelesko commented 8 months ago

@eero-t @JablonskiMateusz

Aurora is a PVC system and normally runs the intel engineering releases.

uname -r
5.14.21-150400.24.55-default

@pvelesko do you build driver packages on your own or you take packages from our github?

I build them myself with -DNEO_ENABLE_i915_PRELIM_DETECTION=TRUE

As for the driver version (runtime driver i.e. NEO?) I use the latest tag and have been trying for months now.

pvelesko commented 8 months ago

You may take a look at exactly how I build it:

https://github.com/pvelesko/intel-compute-runtime-build

eero-t commented 8 months ago

You may take a look at exactly how I build it: https://github.com/pvelesko/intel-compute-runtime-build

At least one problem in that is not building everything with the same compiler version. LLVM changes its API between versions, so all components in the compute stack need to be build with the same compiler version, including opencl-clang, llvm-spirv and SPIRV tools.

In my case, I get LLVM, opencl-clang and llvm-spirv stuff from the distro:

$ sudo apt install --no-install-recommends libc-dev libz-dev libboost-all-dev libnl-genl-3-dev ocl-icd-opencl-dev \
    llvm-spirv-14 libllvmspirvlib-14-dev clang-14 llvm-14-dev liblld-14-dev libopencl-clang-14-dev

And build gmmlib, SPIRV-Headers, SPIRV-Tools, vc-intrinsics, intel-graphics-compiler, level-zero and compute-runtime myself.

IGC is built with -DIGC_OPTION__LLVM_PREFERRED_VERSION=14 option, and compute-runtime with: -DSUPPORT_PVC=1 -DNEO_ENABLE_i915_PRELIM_DETECTION=TRUE -DNEO_DISABLE_LD_GOLD=1

(When using packages from Ubuntu, main issue is getting suitable opencl-clang version, as LLVM 14 based ones are there only starting from 23.04, so I'm doing all my builds within a container that uses suitable Ubuntu version as base.)

pvelesko commented 8 months ago

@eero-t

At least one problem in that is not building everything with the same compiler version

Can you elaborate on what I am building with different compiler versions? I compile clang-14 with LLVM-SPIRV and opencl-clang prior to building intel-compute-runtime stack.

I can't use pre-built packages since I don't have sudo on this system.

Also, I have verified that the build system works on at least 2 different non-PVC systems.

eero-t commented 8 months ago

I compile clang-14 with LLVM-SPIRV and opencl-clang prior to building intel-compute-runtime stack. Also, I have verified that the build system works on at least 2 different non-PVC systems.

Ok, so issue is not with build, but actual PVC support, either in the kernel, or in the user-space driver.

Kernel driver

Does dmesg | grep i915 show PVC being recognized properly by the kernel?

Or if you do not have access to run dmesg, does output from lspci|grep -e VGA -e Display and ls -l /dev/dri/ correspond?

User-space driver

You could add -LH option to CMake, so that it outputs all its config options [1]. But from the build-time tests output one can already see whether PVC support was built in, and passing those tests [2].

I just tested following setup with Ubuntu LLVM 14 packages:

(On top of SLES SP15 "5.14.21-150400.24.92-default" kernel, but I'm not sure where its i915 KMD came, as I did not install it.)

And at least clinfo recognizes 1T PVC device I have access to:

Number of devices                                 1
  Device Name                                     Intel(R) Data Center GPU Max 1100
...
  Driver Version                                  23.48.027912

(cl-mem gave also reasonable looking output with that driver.)


[1] CMake configure output:

-- Auto-Enabling XE_HPC_CORE support for PVC
-- All supported platforms:  PVC MTL DG2 ARL TGLLP DG1 RKL ADLS ADLP ADLN SKL KBL GLK CFL BXT
-- All tested platforms:  PVC MTL DG2 ARL TGLLP DG1 RKL ADLS ADLP ADLN SKL KBL GLK CFL BXT
-- Default supported platform: PVC
-- Default tested platform: PVC
-- All supported core families: GEN9;GEN12LP;XE_HPG_CORE;XE_HPC_CORE
-- All tested core families: GEN9;GEN12LP;XE_HPG_CORE;XE_HPC_CORE
-- Default tested family name: XeHpcCoreFamily
-- Platforms to have WDDM_LINUX disabled: PVC
...
-- i915 prelim headers detection: TRUE
...
// Support PVC
SUPPORT_PVC:BOOL=1
...
// Build ULTs for PVC
TESTS_PVC:BOOL=1

Build tests output:

[6189/6217] Running utility command for run_pvc_47_shared_tests
...
========================
==  PVC ULTs PASSED   ==
========================
Tests run:      8963
Tests passed:   8860
Tests skipped:  103
Tests failed:   0
Tests disabled: 12
...
<lots of other tests, also for PVC>
...
Running ze_intel_gpu_sysman_tests 2x4x5
....
========================
==  PVC ULTs PASSED   ==
========================
Tests run:      801
Tests passed:   801
Tests skipped:  0
Tests failed:   0
Tests disabled: 0
JablonskiMateusz commented 8 months ago

@pvelesko

but devices fail to initialize.

Could you run strace -o strace.log clinfo and share strace.log?