squat / generic-device-plugin

A Kubernetes device plugin to schedule generic Linux devices
Apache License 2.0
210 stars 23 forks source link

Evict pod if the device is removed #61

Open tnyeanderson opened 11 months ago

tnyeanderson commented 11 months ago

I hope that I'm just missing something simple here. I have configured a USB device in the generic-device-plugin, and I'm able to ensure that a certain pod will only be scheduled on nodeA which has that USB device plugged in by setting resource limits. So far: AWESOME!

I can unplug the USB device from nodeA and plug it into nodeB, and each node's .status.capacity and .status.allocatable are updated to reflect which node has the device. PERFECT!

The problem that I have is that if the pod has already been scheduled and is running before I move the USB to nodeB, the pod will remain on the node which no longer has the device available. I was hoping that the scheduler would recognize that the node no longer has the resources to support the pod, evict it, and eventually reschedule it on nodeB once it's available. But this doesn't happen according to my testing.

I've thought of a few possible workarounds (involving labels and affinity rules), but I wanted to see if there's any existing ideas/solutions.

squat commented 11 months ago

Hi @tnyeanderson, eviction functionality is not part of the device plugin today! That would be a very cool feature to consider adding.

If you're interested in contributing such functionality, I would happily review and merge. This is probably the best place to look for inspiration: https://github.com/kubernetes-sigs/descheduler.

Designing this component could be somewhat tricky. The component needs to keep track of the identity of every device on every node and who it's been allocated to. If the pod tracking this information dies, then it would lose track of what pods have been allocated what devices. We'd need to make this information persistent. Thinking quickly, one way to accomplish this would be for the plugin to annotate all pods that receive a device with the device's ID and then to evict pods matching an annotation for a device that's disappeared from a node.

tnyeanderson commented 11 months ago

Thinking quickly, one way to accomplish this would be for the plugin to annotate all pods that receive a device with the device's ID and then to evict pods matching an annotation for a device that's disappeared from a node.

Sounds brilliant to me! I'll be away for the holidays, and the new year at work (and at home) tends to be a little busy, but I'll try to hit this sometime in January and get it to you for review.

Thanks!