okd-project / okd

The self-managing, auto-upgrading, Kubernetes distribution for everyone
https://okd.io
Apache License 2.0
1.71k stars 294 forks source link

Update to 4.6.0-0.okd-2021-01-23-132511 fails with "spec.servingCerts.namedCertificates[...].servingCertificate.name "cloud1-api.xxx.xxx" is used by other indexes: [0,0]" #487

Closed schulh closed 3 years ago

schulh commented 3 years ago

Describe the bug There seems to be a bug in the update process that prevents the kube-apiserver from updating. The kube-apiserver-operator prints the following error message:

I0127 12:15:14.928504       1 request.go:645] Throttling request took 1.950880247s, request: GET:https://10.1.128.1:443/api/v1/namespaces/openshift-kube-apiserver/pods?labelSelector=apiserver%3Dtrue
I0127 12:15:15.346392       1 termination_observer.go:236] Observed event "TerminationStart" for API server pod "kube-apiserver-cloud1-control-2" (last termination at 2021-01-19 15:55:17 +0000 UTC) at 2021-01-27 12:15:15 +0000 UTC
I0127 12:15:15.438774       1 termination_observer.go:236] Observed event "TerminationPreShutdownHooksFinished" for API server pod "kube-apiserver-cloud1-control-2" (last termination at 2021-01-19 15:55:17 +0000 UTC) at 2021-01-27 12:15:15 +0000 UTC
I0127 12:15:15.614958       1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-apiserver-operator", Name:"kube-apiserver-operator", UID:"a2dea499-7df2-4187-ad7c-c25569d36f69", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal
' reason: 'ConfigMapUpdated' Updated ConfigMap/kube-apiserver-client-ca -n openshift-config-managed:
cause by changes in data.ca-bundle.crt
W0127 12:15:15.636932       1 observe_apiserver.go:216] errors during apiservers.config.openshift.io/cluster processing: [spec.servingCerts.namedCertificates[...].servingCertificate.name "cloud1-api.one.xxx.com" is used by other indexes: [0,0]]
E0127 12:15:15.637510       1 base_controller.go:250] "ConfigObserver" controller failed to sync "key", err: spec.servingCerts.namedCertificates[...].servingCertificate.name "cloud1-api.one.xxx.com" is used by other indexes: [0,0]
W0127 12:15:15.637920       1 observe_apiserver.go:216] errors during apiservers.config.openshift.io/cluster processing: [spec.servingCerts.namedCertificates[...].servingCertificate.name "cloud1-api.one.xxx.com" is used by other indexes: [0,0]]
E0127 12:15:15.638366       1 base_controller.go:250] "ConfigObserver" controller failed to sync "key", err: spec.servingCerts.namedCertificates[...].servingCertificate.name "cloud1-api.one.xxx.com" is used by other indexes: [0,0]
I0127 12:15:15.928572       1 request.go:645] Throttling request took 2.389906974s, request: GET:https://10.1.128.1:443/api/v1/namespaces/openshift-kube-apiserver/pods?labelSelector=apiserver%3Dtrue

We were not able to find anything about this specific error message.

Version 4.5.0-0.okd-2020-10-15-235428 on bare metal with UPI

How reproducible Use snapshot that was made before starting the update process and start it again. The Installer gives the same error message.

grafik

vrutkovs commented 3 years ago

Please attach (or upload to the public file sharing service) must-gather archive

schulh commented 3 years ago

I will try, at the moment it exposes too much information.

schulh commented 3 years ago

Ok, we found the problem: oc get apiserver cluster -o yaml

...
  servingCerts:
    namedCertificates:
    - names:
      - cloud1-api.one.xxx.com
      - api.cloud1.xxx.de
      - openshift.cloud1.xxx.de
      - openshift
      - api
      - cloud1-api.one.xxx.com
      servingCertificate:
        name: api-xxx

The line "cloud1-api.one.xxx.com" was present two times. As soon as we removed the second entry, the update was making progress again.