Recently (I can't pinpoint exactly when in the past month) my "Intel(R) Arc(TM) A750 Graphics" card no longer shows up in the output of clinfo nor sycl-ls on my Arch Linux install. With the default intel-compute-runtime package, I get the following output:
$ clinfo
Number of platforms 0
I installed the latest available intel-oneapi-basekit and setting it up adds more platforms, but the GPU is still missing:
When I hardcode the value 0xffffffffffff as the address space (all ones) and use the modified libigdrcl.so as a new vendor in /etc/OpenCL/vendors, I can now see the card in clinfo and sycl-ls output and can successfully run OpenCL programs, like memtestCL:
Why would the gpuAddressSpace end up with an "incorrect" value, could it mean a hardware failure? I don't see any issues with the card, e.g. using memtest or playing games.
Recently (I can't pinpoint exactly when in the past month) my "Intel(R) Arc(TM) A750 Graphics" card no longer shows up in the output of
clinfo
norsycl-ls
on my Arch Linux install. With the default intel-compute-runtime package, I get the following output:I installed the latest available intel-oneapi-basekit and setting it up adds more platforms, but the GPU is still missing:
When running
strace clinfo
I can see the card being evaluated ( strace output ) but it does not show up in the output.Kernel I'm running:
I ended up getting and compiling this runtime with debug info and tracing through the OpenCL loader to see where it's being "dropped". It turns out that here https://github.com/intel/compute-runtime/blob/e44ac2a0017434b2af6fdf5601d98975640e781e/shared/source/os_interface/linux/drm_memory_manager.cpp#L102 the card is being dropped, as the value of
gpuAddressSpace
is equal to 281474976645119, which is 111111111111111111111111111111101111111111111111 in binary (notice the 0?), and does not match any of the branches in https://github.com/intel/compute-runtime/blob/master/shared/source/memory_manager/gfx_partition.cppWhen I hardcode the value 0xffffffffffff as the address space (all ones) and use the modified
libigdrcl.so
as a new vendor in /etc/OpenCL/vendors, I can now see the card inclinfo
andsycl-ls
output and can successfully run OpenCL programs, like memtestCL:Why would the
gpuAddressSpace
end up with an "incorrect" value, could it mean a hardware failure? I don't see any issues with the card, e.g. using memtest or playing games.