oneapi-src / level-zero

oneAPI Level Zero Specification Headers and Loader
https://spec.oneapi.com/versions/latest/elements/l0/source/index.html
MIT License
219 stars 96 forks source link

Driver not initialized and Integrated HD graphics not recognized as ZE_DEVICE_TYPE_GPU after v1.11 spec changes #209

Closed stratika closed 1 month ago

stratika commented 1 month ago

Hello, I noticed an issue with the latest commit point 5b4317c. The problem is that it does not allow me to run the zello_world example on an Intel(R) UHD Graphics 630 GPU.

The steps that I have followed:

git clone https://github.com/oneapi-src/level-zero
cd level-zero
mkdir build
cd build
cmake ..
cmake --build . --config Release
./bin/zello_world

The output is:

Driver not initialized: ZE_RESULT_ERROR_UNINITIALIZED
Did NOT find matching ZE_DEVICE_TYPE_GPU device!

if I checkout to version v1.17.45 or I use the previous commit point b757f43, the example works on my GPU.

Is this expected? Do the new v1.11 specification changes for the L0 loader not recognize integrated GPUs as Level Zero devices, or is there anything I should do to restore this functionality?

nrspruit commented 1 month ago

Confirmed fix for this issue: https://github.com/oneapi-src/level-zero/pull/210 , the new ddi table in sysman was not being listed as "optional" for the init.

stratika commented 1 month ago

hi @nrspruit, thanks for the fast fix. We have just tried the commit of the fix and noticed that the behaviour across different operating systems is different.

On Ubuntu 23.10, it produces the following:

./bin/zello_world 
Driver not initialized: ZE_RESULT_ERROR_UNSUPPORTED_VERSION
Did NOT find matching ZE_DEVICE_TYPE_GPU device!

On Fedora 40, it works as expected. In both cases we are targeting an integrated GPU.

I would appreciate if you could provide some assistance.

nrspruit commented 1 month ago

hi @nrspruit, thanks for the fast fix. We have just tried the commit of the fix and noticed that the behaviour across different operating systems is different.

On Ubuntu 23.10, it produces the following:

./bin/zello_world 
Driver not initialized: ZE_RESULT_ERROR_UNSUPPORTED_VERSION
Did NOT find matching ZE_DEVICE_TYPE_GPU device!

On Fedora 40, it works as expected. In both cases we are targeting an integrated GPU.

I would appreciate if you could provide some assistance. ZE_RESULT_ERROR_UNSUPPORTED_VERSION is because you are trying to use a newer L0 loader ie ze_loader.dll without also updating the ze_tracing_layer or ze_validation_layer as well. that is not a bug with the loader, but a bug with your environment.

For security reasons, on windows the L0 Loader can only load the layers from system32. This means that if you don't update the tracing layer and validation layer dlls in the system32 as well, then ZE_RESULT_ERROR_UNSUPPORTED_VERSION is the expected error.

The L0 loader on windows is required to be run from system32 only. That is because you need to consider the loader as being made up of more than one dll, it is ze_loader.dll, ze_tracing_layer.dll and ze_validation_layer.dll not 3 separate dlls that are not linked.

If you are making a custom build, then you have to update all three in system32, you cannot just update ze_loader.dll. This is why installers for L0 drivers all install the L0 loader to system32 while checking that the version is newer as the condition to overwrite.

stratika commented 1 month ago

u are making a custom build, then you have to update all three in system32

I am not using Windows. I am using Ubuntu Linux OS.

nrspruit commented 1 month ago

u are making a custom build, then you have to update all three in system32

I am not using Windows. I am using Ubuntu Linux OS.

Apologies, the error though is the same, if you don't update all 3 in the load library path, then ZE_RESULT_ERROR_UNSUPPORTED_VERSION will be the error returned. You can tell if you run with ZE_ENABLE_LOADER_DEBUG_TRACE=1 if you see ZE_LOADER_DEBUG_TRACE:Tracing Layer Library Path: ze_tracing_layer.dll, then an error, then your tracing layer and validation layer is too old.

ZE_RESULT_ERROR_UNSUPPORTED_VERSION is an error thrown when you have mixed libraries that are too old ie:

ZE_RESULT_ERROR_UNSUPPORTED_VERSION will be thrown if the tracing or validation layer is older than the ze_loader being used. This is because all three libraries must be used together, they cannot be separately updated.

Otherwise, ZE_RESULT_ERROR_UNSUPPORTED_VERSION is thrown from the L0 Driver if the major version does not match, which would not be the case here.

so, you are running into the case where the 3 libraries for the loader were not all updated in your load library path on linux.

stratika commented 1 month ago

thank you for your reply. I can say that I have many questions, since I am not very familiar with the internal layers of level zero. If I got it right, I need to re-install three layers (the tracing layer, the validation layer and the level zero loader)?

Is there any documentation on how to do that if I want to build level-zero from source?

At the moment, I am following the instructions in the README file and I also used the ZE_ENABLE_LOADER_DEBUG_TRACE flag and I got the following:

./bin/zello_world 
ZE_LOADER_DEBUG_TRACE:Using Loader Library Path: 
ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_gpu.so.1
ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_gpu_legacy1.so.1
ZE_LOADER_DEBUG_TRACE:Load Library of libze_intel_gpu_legacy1.so.1 failed with libze_intel_gpu_legacy1.so.1: cannot open shared object file: No such file or directory
ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_vpu.so.1
ZE_LOADER_DEBUG_TRACE:Load Library of libze_intel_vpu.so.1 failed with libze_intel_vpu.so.1: cannot open shared object file: No such file or directory
ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_npu.so.1
ZE_LOADER_DEBUG_TRACE:Load Library of libze_intel_npu.so.1 failed with libze_intel_npu.so.1: cannot open shared object file: No such file or directory
ZE_LOADER_DEBUG_TRACE:Tracing Layer Library Path: libze_tracing_layer.so.1
Driver not initialized: ZE_RESULT_ERROR_UNSUPPORTED_VERSION
Did NOT find matching ZE_DEVICE_TYPE_GPU device!
stratika commented 1 month ago

to complement my previous message on another machine that has Pop!_OS 22.04, the zello_world example works, and the complete output with the DEBUG_TRACE flag enabled is as follows:

./bin/zello_world 
ZE_LOADER_DEBUG_TRACE:Using Loader Library Path: 
ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_gpu.so.1
ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_gpu_legacy1.so.1
ZE_LOADER_DEBUG_TRACE:Load Library of libze_intel_gpu_legacy1.so.1 failed with libze_intel_gpu_legacy1.so.1: cannot open shared object file: No such file or directory
ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_vpu.so.1
ZE_LOADER_DEBUG_TRACE:Load Library of libze_intel_vpu.so.1 failed with libze_intel_vpu.so.1: cannot open shared object file: No such file or directory
ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_npu.so.1
ZE_LOADER_DEBUG_TRACE:Load Library of libze_intel_npu.so.1 failed with libze_intel_npu.so.1: cannot open shared object file: No such file or directory
ZE_LOADER_DEBUG_TRACE:Tracing Layer Library Path: libze_tracing_layer.so.1
ZE_LOADER_DEBUG_TRACE:Load Library of libze_tracing_layer.so.1 failed with libze_tracing_layer.so.1: cannot open shared object file: No such file or directory
ZE_LOADER_DEBUG_TRACE:check_drivers(flags=0(ZE_INIT_ALL_DRIVER_TYPES_ENABLED))
ZE_LOADER_DEBUG_TRACE:init driver libze_intel_gpu.so.1 zeInit(0(ZE_INIT_ALL_DRIVER_TYPES_ENABLED)) returning ZE_RESULT_SUCCESS
Driver initialized.
zelLoaderGetVersions number of components found: 1
Version 0
Name: loader
Major: 1
Minor: 18
Patch: 1
Found ZE_DEVICE_TYPE_GPU device...
Driver version: 17002962
API version: 1.3
Device::properties_t::stype : DEVICE_PROPERTIES
Device::properties_t::pNext : 0x0
Device::properties_t::type : ZE_DEVICE_TYPE_GPU
Device::properties_t::vendorId : 32902
Device::properties_t::deviceId : 16018
Device::properties_t::flags : Device::{ PROPERTY_FLAG_INTEGRATED }
Device::properties_t::subdeviceId : 0
Device::properties_t::coreClockRate : 1200
Device::properties_t::maxMemAllocSize : 4294959104
Device::properties_t::maxHardwareContexts : 65536
Device::properties_t::maxCommandQueuePriority : 0
Device::properties_t::numThreadsPerEU : 7
Device::properties_t::physicalEUSimdWidth : 8
Device::properties_t::numEUsPerSubslice : 8
Device::properties_t::numSubslicesPerSlice : 3
Device::properties_t::numSlices : 1
Device::properties_t::timerResolution : 83
Device::properties_t::timestampValidBits : 36
Device::properties_t::kernelTimestampValidBits : 32
Device::properties_t::uuid : device_uuid_t::id : [ 134, 128, 146, 62, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0 ]

Device::properties_t::name : Intel(R) UHD Graphics 630

stype : DEVICE_COMPUTE_PROPERTIES
pNext : 0x0
maxTotalGroupSize : 256
maxGroupSizeX : 256
maxGroupSizeY : 256
maxGroupSizeZ : 256
maxGroupCountX : 4294967295
maxGroupCountY : 4294967295
maxGroupCountZ : 4294967295
maxSharedLocalMemory : 65536
numSubGroupSizes : 3
subGroupSizes : [ 8, 16, 32, 0, 0, 0, 0, 0 ]

stype : DEVICE_MEMORY_PROPERTIES
pNext : 0x0
flags : Device::{ 0 }
maxClockRate : 0
maxBusWidth : 64
totalSize : 62761783296
name : DDR

ze_device_memory_access_properties_t.stype : DEVICE_MEMORY_ACCESS_PROPERTIES
ze_device_memory_access_properties_t.pNext : 0x0
ze_device_memory_access_properties_t.hostAllocCapabilities : Device::{ MEMORY_ACCESS_CAP_FLAG_RW | MEMORY_ACCESS_CAP_FLAG_ATOMIC }
ze_device_memory_access_properties_t.deviceAllocCapabilities : Device::{ MEMORY_ACCESS_CAP_FLAG_RW | MEMORY_ACCESS_CAP_FLAG_ATOMIC }
ze_device_memory_access_properties_t.sharedSingleDeviceAllocCapabilities : Device::{ MEMORY_ACCESS_CAP_FLAG_RW | MEMORY_ACCESS_CAP_FLAG_ATOMIC }
ze_device_memory_access_properties_t.sharedCrossDeviceAllocCapabilities : Device::{ 0 }
ze_device_memory_access_properties_t.sharedSystemAllocCapabilities : Device::{ 0 }

ze_device_cache_properties_t.stype : DEVICE_CACHE_PROPERTIES
ze_device_cache_properties_t.pNext : 0x0
ze_device_cache_properties_t.flags : Device::{ 0 }
ze_device_cache_properties_t.cacheSize : 786432

ze_device_image_properties_t.stype : DEVICE_IMAGE_PROPERTIES
ze_device_image_properties_t.pNext : 0x0
ze_device_image_properties_t.maxImageDims1D : 16384
ze_device_image_properties_t.maxImageDims2D : 16384
ze_device_image_properties_t.maxImageDims3D : 2048
ze_device_image_properties_t.maxImageBufferSize : 268434944
ze_device_image_properties_t.maxImageArraySlices : 2048
ze_device_image_properties_t.maxSamplers : 16
ze_device_image_properties_t.maxReadImageArgs : 128
ze_device_image_properties_t.maxWriteImageArgs : 128

Congratulations, the device completed execution!
nrspruit commented 1 month ago

thank you for your reply. I can say that I have many questions, since I am not very familiar with the internal layers of level zero. If I got it right, I need to re-install three layers (the tracing layer, the validation layer and the level zero loader)?

Is there any documentation on how to do that if I want to build level-zero from source?

At the moment, I am following the instructions in the README file and I also used the ZE_ENABLE_LOADER_DEBUG_TRACE flag and I got the following:

./bin/zello_world 
ZE_LOADER_DEBUG_TRACE:Using Loader Library Path: 
ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_gpu.so.1
ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_gpu_legacy1.so.1
ZE_LOADER_DEBUG_TRACE:Load Library of libze_intel_gpu_legacy1.so.1 failed with libze_intel_gpu_legacy1.so.1: cannot open shared object file: No such file or directory
ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_vpu.so.1
ZE_LOADER_DEBUG_TRACE:Load Library of libze_intel_vpu.so.1 failed with libze_intel_vpu.so.1: cannot open shared object file: No such file or directory
ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_npu.so.1
ZE_LOADER_DEBUG_TRACE:Load Library of libze_intel_npu.so.1 failed with libze_intel_npu.so.1: cannot open shared object file: No such file or directory
ZE_LOADER_DEBUG_TRACE:Tracing Layer Library Path: libze_tracing_layer.so.1
Driver not initialized: ZE_RESULT_ERROR_UNSUPPORTED_VERSION
Did NOT find matching ZE_DEVICE_TYPE_GPU device!

so, from your log the tracing layer is reporting ZE_RESULT_ERROR_UNSUPPORTED_VERSION because it was not also updated by the build. That is the reason for the error. ZE_RESULT_ERROR_UNSUPPORTED_VERSION is only thrown if the version of the tracing layer is older than the loader being run.

This is because the ddi tables or the function pointer tables must match for tracing and validation to work properly thru the layers.

stratika commented 1 month ago

thank you for your reply. I can say that I have many questions, since I am not very familiar with the internal layers of level zero. If I got it right, I need to re-install three layers (the tracing layer, the validation layer and the level zero loader)? Is there any documentation on how to do that if I want to build level-zero from source? At the moment, I am following the instructions in the README file and I also used the ZE_ENABLE_LOADER_DEBUG_TRACE flag and I got the following:

./bin/zello_world 
ZE_LOADER_DEBUG_TRACE:Using Loader Library Path: 
ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_gpu.so.1
ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_gpu_legacy1.so.1
ZE_LOADER_DEBUG_TRACE:Load Library of libze_intel_gpu_legacy1.so.1 failed with libze_intel_gpu_legacy1.so.1: cannot open shared object file: No such file or directory
ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_vpu.so.1
ZE_LOADER_DEBUG_TRACE:Load Library of libze_intel_vpu.so.1 failed with libze_intel_vpu.so.1: cannot open shared object file: No such file or directory
ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_npu.so.1
ZE_LOADER_DEBUG_TRACE:Load Library of libze_intel_npu.so.1 failed with libze_intel_npu.so.1: cannot open shared object file: No such file or directory
ZE_LOADER_DEBUG_TRACE:Tracing Layer Library Path: libze_tracing_layer.so.1
Driver not initialized: ZE_RESULT_ERROR_UNSUPPORTED_VERSION
Did NOT find matching ZE_DEVICE_TYPE_GPU device!

so, from your log the tracing layer is reporting ZE_RESULT_ERROR_UNSUPPORTED_VERSION because it was not also updated by the build. That is the reason for the error. ZE_RESULT_ERROR_UNSUPPORTED_VERSION is only thrown if the version of the tracing layer is older than the loader being run.

This is because the ddi tables or the function pointer tables must match for tracing and validation to work properly thru the layers.

thanks, how can I also make the tracing layer to be updated by the build? Are there any instructions?