Closed eero-t closed 3 years ago
does vainfo --display drm --device /dev/dri/renderD129
work for you ?
@xhaihao Of course, but it cannot be hard-coded like that, because there's no way to know before hand which of the GPUs on some specific cluster node is assigned to a given container (when e.g. hundred instances of given container are launched to multiple nodes with several GPUs on each).
Other device files (whether they are GPUs or not) not being visible to the container is how container security is managed.
vaGetDisplayDRM: https://github.com/intel/libva/blob/master/va/drm/va_drm.h#L43-L53
Ok, so basically VA-API is "broken as designed":
As to the applications:
Gstreamer-vaapi also has environment variable for specifying the default: https://gstreamer.freedesktop.org/documentation/vaapi/index.html
In Kubernetes environment it would be possible to inject environment variable telling which GPU got assigned to a container, so I'll close this as duplicate of https://github.com/intel/libva/issues/221.
I.e. scanning stops for GPU devices if there's no renderD128.
Same problem happens also with FFmpeg, i.e. it seems generic VA-API issue (e.g. "clinfo" does not have this problem, and neither have any of the other OpenCL programs I tried).
I looked quickly into va-utils/libva/libdrm sources to find out where this logic actually is, but failed. So I filed this against the libva library, as that seems most logical place for that logic. If some other component is culprit, please tell which one.
In container environments (e.g. with Kubernetes Intel GPU plugin [1]), on hosts with multiple GPUs, GPU device mapped to the container may have any valid (renderDXX) file name, without there being other DRI device files.
[1] Whole sysfs is mounted as-is to all containers, so GPU plugin does not map GPU device file name to renderD128, as then it would not match to correct sub-directory in sysfs (which could break applications in ways that are much harder to debug).