okd-project / okd

The self-managing, auto-upgrading, Kubernetes distribution for everyone
https://okd.io
Apache License 2.0
1.67k stars 289 forks source link

Upgrade from 4.13.0-0.okd-2023-10-28-065448 to 4.14.0-0.okd-2024-01-06-084517 stuck on network CO because of ovnkube-node DS image required value and hostPort duplicate value #1866

Closed MarkusLandau closed 5 months ago

MarkusLandau commented 5 months ago

Describe the bug The upgrade is stuck at the network cluster operator. It looks like it cannot update ovnkube-node. The error message is:

Error while updating operator configuration: could not apply (apps/v1, Kind=DaemonSet) openshift-ovn-kubernetes/ovnkube-node: failed to apply / update (apps/v1, Kind=DaemonSet) openshift-ovn-kubernetes/ovnkube-node: DaemonSet.apps "ovnkube-node" is invalid: [spec.template.spec.containers[2].image: Required value, spec.template.spec.containers[5].image: Required value, spec.template.spec.containers[3].ports[0].hostPort: Duplicate value: "TCP//9103", spec.template.spec.containers[9].ports[0].hostPort: Duplicate value: "TCP//29103"]

Version 4.13.0-0.okd-2023-10-28-065448 to 4.14.0-0.okd-2024-01-06-084517

How reproducible The error has already occurred once when attempting to upgrade to 4.14.0-0.okd-2023-11-14-101924. The upgrade process was therefore canceled and a new release was waited for. And now again when attempting to upgrade to 4.14.0-0.okd-2024-01-06-084517. Note: The cluster is a long-running one, that means numerous upgrades have already been carried out.

Log bundle Download page for log bundle

vrutkovs commented 5 months ago

I suppose its a dupe of https://github.com/okd-project/okd/issues/1775, which is https://issues.redhat.com/browse/OCPBUGS-24691

MarkusLandau commented 5 months ago

Thank you for the tip, especially the second link. This made me understand that the problem lies in the remaining managedFields. In particular, https://issues.redhat.com/browse/OCPBUGS-24036 explains which managers are only allowed to exist. I therefore deleted a third, older manager in the DS ovnkube-node managedFields. In summary, all that remains is the following:

  managedFields:
    - manager: cluster-network-operator/operconfig
        [...]
    - manager: kube-controller-manager 
        [...]

The upgrade then continues. At the moment it looks like this should go through successfully.

vrutkovs commented 5 months ago

Right, its definitely a dupe then, lets continue there if the upgrade fails