Closed Ruakij closed 1 year ago
Expected result: When one device in a group doesnt exist skip group
Yes this is absolutely the intended behavior of the plugin. The behavior you documented is a bug. It should be easy to create a reproduction case for the e2e tests to also avoid regressions.
Regarding special devices, do you have any ideas around doing feature discovery and writing feature specification in the configuration?
@Ruakij I just ran some tests and was able to track down the bug. It's a one word change :) I'm going to create a test case to avoid regressions.
Great! Thought it was just a minor thing :)
Regarding special devices, do you have any ideas around doing feature discovery and writing feature specification in the configuration?
I am not entirely sure.
I'd also like to know which devices the system has detected. Currently there is nothing in the logs for this.
Ideally this would be machine-readable at a well-known endpoint aswell.. maybe as annotation or just an API
Detect special devices, read properties, present as sub-ressources
This is the part that I'm most concerned about. I cannot imagine a way to do this generically for any random Linux device without building special logic into the plugin for each different kind of device. Maybe that's a future plugin system? Then we'd have plugins inside of plugins :p. Or can you imagine some standard syscall or file system discovery mechanism we could use to surface more information to the cluster? Ideally, the point of this all would be to surface more information that can be used for scheduling pods and allocating resources to them.
I'd also like to know which devices the system has detected
How is this different from looking at kubectl describe node X
and reading the available devices that the plugin has found on that node?
Part of the point of Kubernetes device allocation is that any device with the same name should be fungible from the point of view of the pod, in other words, the exact underlying device should not matter to the pod, so surfacing more information about the concrete device shouldn't be relevant. Still, maybe we can log it for administrators to debug?
I cannot imagine a way to do this generically for any random Linux device
Yeah, this is only really possible using modules/plugins for these special devices to discover their capabilities.
How is this different from looking at
kubectl describe node X
[..]
Oh i didnt knew this already existed. But for debug reasons it might also be interesting to see on which device some discovery stopped or failed maybe.
the exact underlying device should not matter to the pod
This would be true if every device is actually the same and doesnt just "look" the same. This is partially doable in a homogenous cluster where every node has the same device (or at least a few have the exact same). The reality is, many clusters are probably heterogenous.
But you are correct, i think discovering capabilities of devices themselves might be out-of-scope and has to be managed some other way. e.g. we know node1-3 have GPUx, thus we can advertise the renderD128 device as this specific gpu-dri-device on those nodes. (could be done using multiple daemonsets and affinity-rules or just node-labels + kubemod)
KISS Principle
Hello, i wanted to use generic-device-plugin to schedule pods which need HW-accelerated video (Intel Quicksync).
This is my setup:
As you can see, i request
/dev/dri/renderD128
aswell as/dev/dri/card0
.But not all nodes actually have a
/dev/dri/renderD128
, so these pods crash:I kinda expected the plugin to just ignore these then, but apparently not?
Expected result:
This also sparked another problem. I cannot distinguish between cards or features supported by it. A more powerful card might be able to do more and offer certain featuresets like encoders.
In fact, i actually only have 1 node with intel quicksync, but another node also has the
/dev/dri/renderD128
device (one of the Oracle ARM systems), but that one just doesnt work at all. I dont think the plugin could detect that.We could, when discovering special devices, check these more deeply and discover features and then be able to request these.
e.g.
generic-device/dri-VAProfileH264High
to show we discovered theVAProfileH264High
encoding profile.