openvinotoolkit / openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
https://docs.openvino.ai
Apache License 2.0
6.89k stars 2.19k forks source link

[Bug] GPU plugin is segfaulting with llvm11 ! #2999

Closed saininav closed 3 years ago

saininav commented 3 years ago

$ classification_sample_async -d GPU -i car.png -m sqeezenet1.1.xml

Thread 1 "classification_" received signal SIGSEGV, Segmentation fault. 0x00007fffeb22d4ab in llvm::LoadInst::LoadInst(llvm::Type, llvm::Value, llvm::Twine const&, llvm::Instruction*) () from /usr/lib/libLLVM-11.so

gdb log is attached.

segmentation_fault_error_gdb_log.txt

jgespino commented 3 years ago

Hi @saininav

Could you please provide additional information about your setup?

Regards, Jesus

vladimir-paramuzov commented 3 years ago

@saininav This seems to be OCL runtime related issue. Could you share the OCL runtime version and clinfo tool output?

@JacekDanecki Could you have a look at this issue? Does NEO runtime support LLVM 11?

saininav commented 3 years ago

clinfo output:

Number of platforms 1 Platform Name Intel(R) OpenCL HD Graphics Platform Vendor Intel(R) Corporation Platform Version OpenCL 2.1 Platform Profile FULL_PROFILE Platform Extensions cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_fp64 cl_khr_subgroups cl_intel_spirv_device_side_avc_motion_estimation cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory_preview cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_device_side_avc_motion_estimation cl_intel_advanced_motion_estimation cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_intel_media_block_io cl_khr_3d_image_writes Platform Host timer resolution 1ns Platform Extensions function suffix INTEL

Platform Name Intel(R) OpenCL HD Graphics Number of devices 1 Device Name Intel(R) Gen9 HD Graphics NEO Device Vendor Intel(R) Corporation Device Vendor ID 0x8086 Device Version OpenCL 2.1 NEO Driver Version 20.40.0 Device OpenCL C Version OpenCL C 2.0 Device Type GPU Device Profile FULL_PROFILE Device Available Yes Compiler Available Yes Linker Available Yes Max compute units 24 Max clock frequency 1150MHz Device Partition (core) Max number of sub-devices 0 Supported partition types None Supported affinity domains (n/a) Max work item dimensions 3 Max work item sizes 256x256x256 Max work group size 256 Preferred work group size multiple 32 Max sub-groups per work group 32 Sub-group sizes (Intel) 8, 16, 32 Preferred / native vector sizes
char 16 / 16
short 8 / 8
int 4 / 4
long 1 / 1
half 8 / 8 (cl_khr_fp16) float 1 / 1
double 1 / 1 (cl_khr_fp64) Half-precision Floating-point support (cl_khr_fp16) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Single-precision Floating-point support (core) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations Yes Double-precision Floating-point support (cl_khr_fp64) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Address bits 64, Little-Endian Global memory size 6574309376 (6.123GiB) Error Correction support No Max memory allocation 3287154688 (3.061GiB) Unified memory for Host and Device Yes Shared Virtual Memory (SVM) capabilities (core) Coarse-grained buffer sharing Yes Fine-grained buffer sharing No Fine-grained system sharing No Atomics No Minimum alignment for any data type 128 bytes Alignment of base address 1024 bits (128 bytes) Preferred alignment for atomics
SVM 64 bytes Global 64 bytes Local 64 bytes Max size for global variable 65536 (64KiB) Preferred total size of global vars 3287154688 (3.061GiB) Global Memory cache type Read/Write Global Memory cache size 524288 (512KiB) Global Memory cache line size 64 bytes Image support Yes Max number of samplers per kernel 16 Max size for 1D images from buffer 205447168 pixels Max 1D or 2D image array size 2048 images Base address alignment for 2D image buffers 4 bytes Pitch alignment for 2D image buffers 4 pixels Max 2D image size 16384x16384 pixels Max planar YUV image size 16384x16352 pixels Max 3D image size 16384x16384x2048 pixels Max number of read image args 128 Max number of write image args 128 Max number of read/write image args 128 Max number of pipe args 16 Max active pipe reservations 1 Max pipe packet size 1024 Local memory type Local Local memory size 65536 (64KiB) Max number of constant args 8 Max constant buffer size 3287154688 (3.061GiB) Max size of kernel argument 2048 (2KiB) Queue properties (on host)
Out-of-order execution Yes Profiling Yes Queue properties (on device)
Out-of-order execution Yes Profiling Yes Preferred size 131072 (128KiB) Max size 67108864 (64MiB) Max queues on device 1 Max events on device 1024 Prefer user sync for interop Yes Profiling timer resolution 83ns Execution capabilities
Run OpenCL kernels Yes Run native kernels No Sub-group independent forward progress Yes IL version SPIR-V_1.2 SPIR versions 1.2 printf() buffer size 4194304 (4MiB) Built-in kernels block_motion_estimate_intel;block_advanced_motion_estimate_check_intel;block_advanced_motion_estimate_bidirectional_check_intel; Motion Estimation accelerator version (Intel) 2 Device-side AVC Motion Estimation version 1 Supports texture sampler use Yes Supports preemption No Device Extensions cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_fp64 cl_khr_subgroups cl_intel_spirv_device_side_avc_motion_estimation cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory_preview cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_device_side_avc_motion_estimation cl_intel_advanced_motion_estimation cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_intel_media_block_io cl_khr_3d_image_writes

NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) Intel(R) OpenCL HD Graphics clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [INTEL] clCreateContext(NULL, ...) [default] Success [INTEL] clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1) Platform Name Intel(R) OpenCL HD Graphics Device Name Intel(R) Gen9 HD Graphics NEO clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1) Platform Name Intel(R) OpenCL HD Graphics Device Name Intel(R) Gen9 HD Graphics NEO clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1) Platform Name Intel(R) OpenCL HD Graphics Device Name Intel(R) Gen9 HD Graphics NEO

ICD loader properties ICD loader Name OpenCL ICD Loader ICD loader Vendor OCL Icd free software ICD loader Version 2.2.12 ICD loader Profile OpenCL 3.0 NOTE: your OpenCL library declares to support OpenCL 3.0, but it seems to support up to OpenCL 2.2 only.

JacekDanecki commented 3 years ago

Neo doesn't depend on llvm/clang directly, but it depends on IGC. llvm/clang are used by IGC compiler, and currently llvm/clang 10 is fully supported, see comment

saininav commented 3 years ago

This runtime segmentation issue not observed with llvm-10.

I am able to reproduce this runtime failure with llvm-11 on Ubuntu 18.04: gdb_backtrace_ubuntu.txt gdb_shared_libraries_ubuntu.txt

OS: Ubuntu 18.04

Installed with apt GCC: 9.3.0 LLVM: 11.0.0 //Ref https://apt.llvm.org/ opencv: 3.2.0

Build from Source:

OpenVino: Commit-ID: f557dca475cb54dcfc9026fbaad0d93ddb85015c tag: 2021.1 Build config: $ cmake -DENABLE_GNA=0 -DENABLE_SAMPLES=1 -DTREAT_WARNING_AS_ERROR=FALSE -DENABLE_SPEECH_DEMO=FALSE -DENABLE_CLDNN=ON -DENABLE_VPU=ON -DENABLE_PROFILING_ITT=OFF -DVERBOSE_BUILD=ON -DENABLE_OPENCV=ON -DENABLE_PYTHON=ON ../

Intel-compute-runtime: Commit-ID: 7e31ec37d78693c08a1fcb2ec31801e64cb497d3 tag: 20.40.18075

Intel-graphics-compiler: Commit-ID: 3e7c8e95b48a4eb6637077c52ff253a37b5ea085 tag: igc-1.0.5176

Opencl-clang: Commit-ID: dbddfc2e4e84dfeddf78c7946aa727acb5640059 Branch: ocl-open-110

SPIRV_LLVM_TRANSLATOR: Commit-ID: d6dc999eee381158a26f048a333467c9ce7e77f2 Branch: llvm_release_110

jgespino commented 3 years ago

Hi @saininav

As mentioned in the referenced comment above, LLVM 11 is not fully supported, please use LLVM 10.

Regards, Jesus