Closed ObieBent closed 3 months ago
Describe the bug
Unable to apply 4.15.0-0.okd-2024-03-10-010116: wait has exceeded 40 minutes for these operators: machine-config
Version
UPI install method 4.14.0-0.okd-2024-01-26-175629
How reproducible
The upgrade is stuck since 2 days... ..All Cluster Operator have been updated, exceptmachine-config (of course)
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.15.0-0.okd-2024-03-10-010116 True False False 11m baremetal 4.15.0-0.okd-2024-03-10-010116 True False False 47d cloud-controller-manager 4.15.0-0.okd-2024-03-10-010116 True False False 47d cloud-credential 4.15.0-0.okd-2024-03-10-010116 True False False 47d cluster-autoscaler 4.15.0-0.okd-2024-03-10-010116 True False False 47d config-operator 4.15.0-0.okd-2024-03-10-010116 True False False 47d console 4.15.0-0.okd-2024-03-10-010116 True False False 37m control-plane-machine-set 4.15.0-0.okd-2024-03-10-010116 True False False 47d csi-snapshot-controller 4.15.0-0.okd-2024-03-10-010116 True False False 40d dns 4.15.0-0.okd-2024-03-10-010116 True False False 40d etcd 4.15.0-0.okd-2024-03-10-010116 True False False 47d image-registry 4.15.0-0.okd-2024-03-10-010116 True False False 2d11h ingress 4.15.0-0.okd-2024-03-10-010116 True False False 2d11h insights 4.15.0-0.okd-2024-03-10-010116 True False False 40d kube-apiserver 4.15.0-0.okd-2024-03-10-010116 True False False 47d kube-controller-manager 4.15.0-0.okd-2024-03-10-010116 True False False 47d kube-scheduler 4.15.0-0.okd-2024-03-10-010116 True False False 47d kube-storage-version-migrator 4.15.0-0.okd-2024-03-10-010116 True False False 27h machine-api 4.15.0-0.okd-2024-03-10-010116 True False False 47d machine-approver 4.15.0-0.okd-2024-03-10-010116 True False False 47d machine-config 4.14.0-0.okd-2024-01-26-175629 True True True 27h Unable to apply 4.15.0-0.okd-2024-03-10-010116: error during syncRequiredMachineConfigPools: [context deadline exceeded, failed to update clusteroperator: [client rate limiter Wait returned an error: context deadline exceeded, error MachineConfigPool infra is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 0)]] marketplace 4.15.0-0.okd-2024-03-10-010116 True False False 47d monitoring 4.15.0-0.okd-2024-03-10-010116 True False False 27h network 4.15.0-0.okd-2024-03-10-010116 True False False 47d node-tuning 4.15.0-0.okd-2024-03-10-010116 True False False 27h openshift-apiserver 4.15.0-0.okd-2024-03-10-010116 True False False 37m openshift-controller-manager 4.15.0-0.okd-2024-03-10-010116 True False False 2d11h openshift-samples 4.15.0-0.okd-2024-03-10-010116 True False False 2d11h operator-lifecycle-manager 4.15.0-0.okd-2024-03-10-010116 True False False 47d operator-lifecycle-manager-catalog 4.15.0-0.okd-2024-03-10-010116 True False False 47d operator-lifecycle-manager-packageserver 4.15.0-0.okd-2024-03-10-010116 True False False 40d service-ca 4.15.0-0.okd-2024-03-10-010116 True False False 47d storage 4.15.0-0.okd-2024-03-10-010116 True False False 47d
Name: machine-config Namespace: Labels: <none> Annotations: exclude.release.openshift.io/internal-openshift-hosted: true include.release.openshift.io/self-managed-high-availability: true include.release.openshift.io/single-node-developer: true API Version: config.openshift.io/v1 Kind: ClusterOperator Metadata: Creation Timestamp: 2024-07-01T00:12:11Z Generation: 1 Owner References: API Version: config.openshift.io/v1 Controller: true Kind: ClusterVersion Name: version UID: 0d23cbeb-28af-4e38-aae2-bb62d5a52858 Resource Version: 27496717 UID: 05ae4ecf-092c-43de-919d-66a13ab07d6f Spec: Status: Conditions: Last Transition Time: 2024-08-16T18:00:18Z Message: Working towards 4.15.0-0.okd-2024-03-10-010116 Status: True Type: Progressing Last Transition Time: 2024-08-16T18:33:49Z Message: Unable to apply 4.15.0-0.okd-2024-03-10-010116: error during syncRequiredMachineConfigPools: [context deadline exceeded, failed to update clusteroperator: [client rate limiter Wait returned an error: context deadline exceeded, error MachineConfigPool infra is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 0)]] Reason: RequiredPoolsFailed Status: True Type: Degraded Last Transition Time: 2024-08-16T17:32:11Z Message: Cluster has deployed [{operator 4.14.0-0.okd-2024-01-26-175629}] Reason: AsExpected Status: True Type: Available Last Transition Time: 2024-08-16T18:03:50Z Message: One or more machine config pools are degraded, please see `oc get mcp` for further details and resolve before upgrading Reason: DegradedPool Status: False Type: Upgradeable Extension: Infra: pool is degraded because nodes fail with "1 nodes are reporting degraded status on sync": "Node worker0.bomar.bme.lab is reporting: \"command \\\"/usr/bin/rpm -qf /etc/audit/rules.d/mco-audit-quiet-containers.rules\\\" returned with unexpected error: error: file /etc/audit/rules.d/mco-audit-quiet-containers.rules: Permission denied\\n: exit status 1\"" Master: pool is degraded because nodes fail with "1 nodes are reporting degraded status on sync": "Node master0.bomar.bme.lab is reporting: \"command \\\"/usr/bin/rpm -qf /etc/audit/rules.d/mco-audit-quiet-containers.rules\\\" returned with unexpected error: error: file /etc/audit/rules.d/mco-audit-quiet-containers.rules: Permission denied\\n: exit status 1\"" Worker: pool is degraded because nodes fail with "1 nodes are reporting degraded status on sync": "Node worker3.bomar.bme.lab is reporting: \"command \\\"/usr/bin/rpm -qf /etc/audit/rules.d/mco-audit-quiet-containers.rules\\\" returned with unexpected error: error: file /etc/audit/rules.d/mco-audit-quiet-containers.rules: Permission denied\\n: exit status 1\"" Related Objects: Group: Name: openshift-machine-config-operator Resource: namespaces Group: machineconfiguration.openshift.io Name: Resource: machineconfigpools Group: machineconfiguration.openshift.io Name: Resource: controllerconfigs Group: machineconfiguration.openshift.io Name: Resource: kubeletconfigs Group: machineconfiguration.openshift.io Name: Resource: containerruntimeconfigs Group: machineconfiguration.openshift.io Name: Resource: machineconfigs Group: Name: Resource: nodes Group: Name: openshift-kni-infra Resource: namespaces Group: Name: openshift-openstack-infra Resource: namespaces Group: Name: openshift-ovirt-infra Resource: namespaces Group: Name: openshift-vsphere-infra Resource: namespaces Group: Name: openshift-nutanix-infra Resource: namespaces Group: Name: openshift-cloud-platform-infra Resource: namespaces Versions: Name: operator Version: 4.14.0-0.okd-2024-01-26-175629 Events: <none>
All MC have Degraded status.
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE infra rendered-infra-8461b7c208fd913262052c7193981d72 False True True 3 0 0 1 28d master rendered-master-f43c37629059b8c27126b806bffb01cd False True True 3 0 0 1 47d worker rendered-worker-8461b7c208fd913262052c7193981d72 False True True 1 0 0 1 47d
NAME STATUS ROLES AGE VERSION master0.bomar.bme.lab Ready control-plane,master 47d v1.27.9+e36e183 master1.bomar.bme.lab Ready control-plane,master 47d v1.27.9+e36e183 master2.bomar.bme.lab Ready control-plane,master 47d v1.27.9+e36e183 worker0.bomar.bme.lab Ready infra 40d v1.27.9+e36e183 worker1.bomar.bme.lab Ready infra 40d v1.27.9+e36e183 worker2.bomar.bme.lab Ready infra 40d v1.27.9+e36e183 worker3.bomar.bme.lab Ready worker 40d v1.27.9+e36e183
Log bundle
The must-gather is too big, ~40MB.
Can you check #1928? This seems to be a duplicate of that discussion.
Great, I've applied the workaround, and MCO doesn't complain anymore. Thanks :)
Describe the bug
Unable to apply 4.15.0-0.okd-2024-03-10-010116: wait has exceeded 40 minutes for these operators: machine-config
Version
UPI install method 4.14.0-0.okd-2024-01-26-175629
How reproducible
The upgrade is stuck since 2 days... ..All Cluster Operator have been updated, exceptmachine-config (of course)
All MC have Degraded status.
Log bundle
The must-gather is too big, ~40MB.