intel / libva

Libva is an implementation for VA-API (Video Acceleration API)
http://intel.github.io/libva/
Other
660 stars 302 forks source link

libva (v2.12) does not find /dev/dri/renderD129 (stops scanning for GPUs if "renderD128" is not present) #540

Closed eero-t closed 3 years ago

eero-t commented 3 years ago
$ ls -l /dev/dri/
crw-rw---- 1 root video 226,   1 Sep  6 17:32 card1
crw-rw---- 1 root   109 226, 129 Sep  6 17:32 renderD129

$ strace -f -e openat vainfo
...
openat(AT_FDCWD, "/dev/dri/renderD128", O_RDWR) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/dev/dri/card0", O_RDWR) = -1 ENOENT (No such file or directory)
error: failed to initialize display
+++ exited with 1 +++

I.e. scanning stops for GPU devices if there's no renderD128.

Same problem happens also with FFmpeg, i.e. it seems generic VA-API issue (e.g. "clinfo" does not have this problem, and neither have any of the other OpenCL programs I tried).

I looked quickly into va-utils/libva/libdrm sources to find out where this logic actually is, but failed. So I filed this against the libva library, as that seems most logical place for that logic. If some other component is culprit, please tell which one.

In container environments (e.g. with Kubernetes Intel GPU plugin [1]), on hosts with multiple GPUs, GPU device mapped to the container may have any valid (renderDXX) file name, without there being other DRI device files.

[1] Whole sysfs is mounted as-is to all containers, so GPU plugin does not map GPU device file name to renderD128, as then it would not match to correct sub-directory in sysfs (which could break applications in ways that are much harder to debug).

xhaihao commented 3 years ago

does vainfo --display drm --device /dev/dri/renderD129 work for you ?

eero-t commented 3 years ago

@xhaihao Of course, but it cannot be hard-coded like that, because there's no way to know before hand which of the GPUs on some specific cluster node is assigned to a given container (when e.g. hundred instances of given container are launched to multiple nodes with several GPUs on each).

Other device files (whether they are GPUs or not) not being visible to the container is how container security is managed.

eero-t commented 3 years ago

vaGetDisplayDRM: https://github.com/intel/libva/blob/master/va/drm/va_drm.h#L43-L53

Ok, so basically VA-API is "broken as designed":

As to the applications:

Gstreamer-vaapi also has environment variable for specifying the default: https://gstreamer.freedesktop.org/documentation/vaapi/index.html

In Kubernetes environment it would be possible to inject environment variable telling which GPU got assigned to a container, so I'll close this as duplicate of https://github.com/intel/libva/issues/221.