Open ProjectPhysX opened 8 months ago
Hi @ProjectPhysX
Could you run command strace -o strace.log clinfo
and share produced strace.log file?
Hi @JablonskiMateusz,
here is strace-before-local-login.log, and visible devices are:
| Device ID 0 | NVIDIA TITAN Xp |
| Device ID 1 | 13th Gen Intel(R) Core(TM) i7-13700K |
| Device ID 2 | Intel(R) FPGA Emulation Device |
After logging in locally on the PC, here is strace-after-local-login.log, and visible devices are:
| Device ID 0 | Intel(R) Arc(TM) A770 Graphics |
| Device ID 1 | Intel(R) UHD Graphics 770 |
| Device ID 2 | NVIDIA TITAN Xp |
| Device ID 3 | 13th Gen Intel(R) Core(TM) i7-13700K |
| Device ID 4 | Intel(R) FPGA Emulation Device |
Kind regards, Moritz
@ProjectPhysX from logs it looks like in the first log you don't have permission to gpu file:
openat(AT_FDCWD, "/dev/dri/by-path/pci-0000:00:02.0-render", O_RDWR|O_CLOEXEC) = -1 EACCES (Permission denied)
Please ensure that user you are using is a member of group render
Hi @JablonskiMateusz,
thanks a lot for the help! An additional sudo usermod -a -G render $(whoami)
fixes the issue.
Please make the installation fix the file permissions or automatically put the user in the render
group, and/or include this line in the intallation instructions.
Kind regards, Moritz
@JablonskiMateusz, out of curiosity why does logging in locally "fix" this issue?
@ProjectPhysX
In our readme we have following line:
To allow NEO access to GPU device make sure user has permissions to files /dev/dri/renderD*.
btw.
out of curiosity why does logging in locally "fix" this issue?
@ProjectPhysX when you logged locally, was it the same user as when you logged over ssh?
@JablonskiMateusz yes, same user. The local login alone triggers the GPU to become visible as OpenCL device. Why can't the installation set the user access rights? Miss this detail and devices won't show up without any error, that's not user-friendly.
thanks a lot for the help! An additional
sudo usermod -a -G render $(whoami)
fixes the issue. Please make the installation fix the file permissions or automatically put the user in therender
group,
It's (definitely) not the driver (package) responsibility to do things like that.
and/or include this line in the intallation instructions.
Yes, that's a good idea. In which all documents you think this should be mentioned?
@JablonskiMateusz yes, same user. The local login alone triggers the GPU to become visible as OpenCL device.
As to what happens when you do graphical login locally... Your GUI session manager grants authenticated user (temporary) access to the display device. Otherwise user's GUI would not work that well (as it would fall back to CPU rendering, or even fail).
Yes, that's a good idea. In which all documents you think this should be mentioned?
Here in the Readme and in the "Installation procedure" in release notes would be good. Thanks!
An additional
sudo usermod -a -G render $(whoami)
fixes the issue.
Older (e.g. Ubuntu) distro versions do not have render
group => it's better to use Intel device group ID directly.
In case host has also non-Intel DRM devices (with different group IDs), Intel GPU device file names can be gotten with following:
grep -l 0x8086 /sys/class/drm/renderD*/device/vendor | cut -d/ -f 5
And group ID for the first one with:
stat --format %g /dev/dri/$(grep -l 0x8086 /sys/class/drm/renderD*/device/vendor | cut -d/ -f 5 | head -1)
Yes, that's a good idea. In which all documents you think this should be mentioned?
Here in the Readme and in the "Installation procedure" in release notes would be good. Thanks!
Thanks! @JablonskiMateusz ?
I am having a similar issue issue after upgrading from Rocky 9.2 to Rocky 9.4. I see my Arc 750 in "lspci" but not in clinfo and I cannot run codes on it. My username is part of the "render" group and I have the Redhat 9.3 driver installed along with OneAPI HPC toolkit 2024.2. Any ideas?
@sumseq I'm not familiar with Rocky, but maybe your kernel and user-space driver do not match anymore after the update? See https://github.com/intel/compute-runtime/issues/710.
@sumseq I'm not familiar with Rocky, but maybe your kernel and user-space driver do not match anymore after the update? See #710.
Thanks for the reference! The environment variables they say to set in that post make it work! For reference:
export NEOReadDebugKeys=1
export OverrideGpuAddressSpace=48
On a fresh Ubuntu Server 23.04 installation (kernel 6.5), after installing NEO and rebooting, when accessing the machine remotely over SSH, the GPU (Arc A770) does not show up as OpenCL device. Only when I locally login at the PC, the GPU immediately shows up as OpenCL device both locally and in the remote terminal.