k8snetworkplumbingwg / sriov-network-operator

Operator for provisioning and configuring SR-IOV CNI plugin and device plugin
Apache License 2.0
85 stars 114 forks source link

config-daemon: Restart all instances of device-plugin #783

Closed zeeke closed 1 month ago

zeeke commented 1 month ago

When the operator changes the device-plugin Spec (e.g. .Spec.NodeSelector), it may happen that there are two device plugin pods for a given node, one that is terminating, the other that is initializing. If the config-daemon executes restartDevicePluginPod() at the same time, it may kill the terminating pod, while the initializing one will run with the old dp configuration. This may cause one or more resources to not being advertised, until a manual device plugin restart occurs.

Make the config-daemon restart all the device-plugin instances it founds for its own node.

github-actions[bot] commented 1 month ago

Thanks for your PR, To run vendors CIs, Maintainers can use one of:

coveralls commented 1 month ago

Pull Request Test Coverage Report for Build 11255003478

Details


Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/daemon/daemon.go 11 25 44.0%
<!-- Total: 11 25 44.0% -->
Totals Coverage Status
Change from base Build 11254641885: 0.006%
Covered Lines: 6663
Relevant Lines: 14801

💛 - Coveralls
zeeke commented 1 month ago

Failing test

Summarizing 1 Failure:
  [FAIL] [sriov] Metrics Exporter When Prometheus operator is available [It] Metrics should have the correct labels
  /root/opr-ocp2-1/data/sriov-network-operator/sriov-network-operator/test/conformance/tests/test_exporter_metrics.go:162

is addressed in