oneapi-src / level-zero

oneAPI Level Zero Specification Headers and Loader
https://spec.oneapi.com/versions/latest/elements/l0/source/index.html
MIT License
208 stars 90 forks source link

Getting ZE_RESULT_ERROR_DEPENDENCY_UNAVAILABLE when running application inside a docker container #96

Closed saratpoluri closed 1 year ago

saratpoluri commented 2 years ago

Docker container image: Ubuntu 20.04.4

Host: Ubuntu 20.04.4, Kernel 5.10.65

Steps to reproduce: docker -it -rm --device /dev/dri Follow instructions here to install the latest compute-runtime packages: https://dgpu-docs.intel.com/installation-guides/ubuntu/ubuntu-focal.html

Clone this level-zero repo, built the zello_world sample.

When zello_world is run, it throws the following error: terminate called after throwing an instance of 'std::runtime_error' what(): Unknown ze_result_t value: 1879179264 Aborted (core dumped)

Which is 0x70020000 i.e ZE_RESULT_ERROR_DEPENDENCY_UNAVAILABLE

After doing gdb, found that the ze_init function fails after calling ze_lib::context->Init and returns the above error code.

Please help me address this issue. Only discrepancy I find in my container is the gid of the /dev/dri/renderD128 file is 'ssh' instead of 'render' in the host OS.

sycl-ls output: [opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device 1.2 [2022.13.3.0.16_160000] [opencl:cpu:1] Intel(R) OpenCL, 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz 3.0 [2022.13.3.0.16_160000] [opencl:gpu:2] Intel(R) OpenCL HD Graphics, Intel(R) Iris(R) Xe Graphics [0x9a49] 3.0 [22.23.23405] [host:host:0] SYCL host platform, SYCL host device 1.2 [1.2]

clinfo: Platform Name Intel(R) OpenCL HD Graphics Number of devices 1 Device Name Intel(R) Iris(R) Xe Graphics [0x9a49] Device Vendor Intel(R) Corporation Device Vendor ID 0x8086 Device Version OpenCL 3.0 NEO Driver Version 22.23.23405 Device OpenCL C Version OpenCL C 1.2 Device Type GPU Device Profile FULL_PROFILE Device Available Yes Compiler Available Yes Linker Available Yes Max compute units 80 Max clock frequency 1300MHz Device Partition (core) Max number of sub-devices 0 Supported partition types None Supported affinity domains (n/a) Max work item dimensions 3 Max work item sizes 512x512x512 Max work group size 512 Preferred work group size multiple <getWGsizes:1175: build program : error -6> Max sub-groups per work group 64 Sub-group sizes (Intel) 8, 16, 32 Preferred / native vector sizes char 16 / 16 short 8 / 8 int 4 / 4 long 1 / 1 half 8 / 8 (cl_khr_fp16) float 1 / 1 double 1 / 1 (n/a) Half-precision Floating-point support (cl_khr_fp16) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Single-precision Floating-point support (core) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Double-precision Floating-point support (n/a) Address bits 64, Little-Endian Global memory size 13112213504 (12.21GiB) Error Correction support No Max memory allocation 4294959104 (4GiB) Unified memory for Host and Device Yes Shared Virtual Memory (SVM) capabilities (core) Coarse-grained buffer sharing Yes Fine-grained buffer sharing No Fine-grained system sharing No Atomics No Minimum alignment for any data type 128 bytes Alignment of base address 1024 bits (128 bytes) Preferred alignment for atomics SVM 64 bytes Global 64 bytes Local 64 bytes Max size for global variable 65536 (64KiB) Preferred total size of global vars 4294959104 (4GiB) Global Memory cache type Read/Write Global Memory cache size 3932160 (3.75MiB) Global Memory cache line size 64 bytes Image support Yes Max number of samplers per kernel 16 Max size for 1D images from buffer 268434944 pixels Max 1D or 2D image array size 2048 images Base address alignment for 2D image buffers 4 bytes Pitch alignment for 2D image buffers 4 pixels Max 2D image size 16384x16384 pixels Max planar YUV image size 16384x16352 pixels Max 3D image size 2048x2048x2048 pixels Max number of read image args 128 Max number of write image args 128 Max number of read/write image args 128 Max number of pipe args 0 Max active pipe reservations 0 Max pipe packet size 0 Local memory type Local Local memory size 65536 (64KiB) Max number of constant args 8 Max constant buffer size 4294959104 (4GiB) Max size of kernel argument 2048 (2KiB) Queue properties (on host) Out-of-order execution Yes Profiling Yes Queue properties (on device) Out-of-order execution No Profiling No Preferred size 0 Max size 0 Max queues on device 0 Max events on device 0 Prefer user sync for interop Yes Profiling timer resolution 52ns Execution capabilities Run OpenCL kernels Yes Run native kernels No Sub-group independent forward progress No IL version SPIR-V_1.2 SPIR versions 1.2 printf() buffer size 4194304 (4MiB) Built-in kernels (n/a) Device Extensions cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_suggested_local_work_size cl_intel_split_work_group_barrier cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info cl_intel_subgroup_local_block_io

bmyates commented 2 years ago

Hi, ZE_RESULT_ERROR_DEPENDENCY_UNAVAILABLE is not returned from level zero loader. This is coming from compute-runtime. Can you please move your question to compute-runtime repo?

eero-t commented 1 year ago

Please help me address this issue. Only discrepancy I find in my container is the gid of the /dev/dri/renderD128 file is 'ssh' instead of 'render' in the host OS.

By default, container runtime takes containerized device access rights from the host, but your container's /etc/group file just does not seem to match the group IDs on the host. In general, you should be checking the numerical IDs, not the "user-friendly" names tools map them to based on (potentially mismatching) /etc/ files in the container.

And to access (write to / communicate with) the GPU device, the container process obviously needs to be a user that does have that access. Either use root, a matching (numerical) user ID or matching (numerical) group ID.