openshift / installer

Install an OpenShift 4.x cluster
https://try.openshift.com
Apache License 2.0
1.39k stars 1.36k forks source link

ignition /config/master tls: failed to verify certificate x509 #8475

Open UriZafrir opened 1 month ago

UriZafrir commented 1 month ago

Version

$ openshift-install version
./openshift-install 4.15.14
built from commit 147d2421af88084cbfbe287140e63949830e5593
release image registry.local:5000/ocp4@sha256:234ccdfa4adabcfa7490785bad7108a3c7d622f19cd5b8f4b241dfba96c09be0
release architecture amd64

Platform:

Please specify the platform type: aws, libvirt, openstack or baremetal

baremetal

Please specify:

What happened?

during openshift install on vsphere the master nodes all get ignition /config/master tls failed to verify certificate x509 and install fails.

image

Enter text here. See the troubleshooting documentation for ideas about what information to collect. For example, if the installer fails to create resources, attach the relevant portions of your .openshift_install.log.

oc --kubeconfig=auth/kubeconfig get clusterversion -oyaml
apiVersion: v1
items:
- apiVersion: config.openshift.io/v1
  kind: ClusterVersion
  metadata:
    creationTimestamp: "2024-05-25T16:07:28Z"
    generation: 1
    name: version
    resourceVersion: "6081"
    uid: 21a86f67-b3c0-4493-8e27-98748b363898
  spec:
    channel: stable-4.15
    clusterID: 3467d915-ff7b-47f2-bbe8-cb11f6f9e29f
    overrides:
    - group: ""
      kind: ConfigMap
      name: cloud-provider-config
      namespace: openshift-config
      unmanaged: true
    - group: ""
      kind: ConfigMap
      name: cluster-config-v1
      namespace: kube-system
      unmanaged: true
    - group: config.openshift.io
      kind: DNS
      name: cluster
      namespace: ""
      unmanaged: true
    - group: config.openshift.io
      kind: Infrastructure
      name: cluster
      namespace: ""
      unmanaged: true
    - group: config.openshift.io
      kind: Ingress
      name: cluster
      namespace: ""
      unmanaged: true
    - group: config.openshift.io
      kind: Network
      name: cluster
      namespace: ""
      unmanaged: true
    - group: config.openshift.io
      kind: Proxy
      name: cluster
      namespace: ""
      unmanaged: true
    - group: config.openshift.io
      kind: Scheduler
      name: cluster
      namespace: ""
      unmanaged: true
    - group: operator.openshift.io
      kind: ImageContentSourcePolicy
      name: image-policy
      namespace: ""
      unmanaged: true
    - group: ""
      kind: Secret
      name: kube-cloud-cfg
      namespace: kube-system
      unmanaged: true
    - group: ""
      kind: ConfigMap
      name: root-ca
      namespace: kube-system
      unmanaged: true
    - group: ""
      kind: Secret
      name: machine-config-server-tls
      namespace: openshift-machine-config-operator
      unmanaged: true
    - group: ""
      kind: Secret
      name: pull-secret
      namespace: openshift-config
      unmanaged: true
    - group: ""
      kind: ConfigMap
      name: user-ca-bundle
      namespace: openshift-config
      unmanaged: true
    - group: ""
      kind: Secret
      name: vsphere-creds
      namespace: kube-system
      unmanaged: true
    - group: config.openshift.io
      kind: FeatureGate
      name: cluster
      namespace: ""
      unmanaged: true
    - group: ""
      kind: Secret
      name: kubeadmin
      namespace: kube-system
      unmanaged: true
    - group: rbac.authorization.k8s.io
      kind: Role
      name: vsphere-creds-secret-reader
      namespace: kube-system
      unmanaged: true
    - group: ""
      kind: ConfigMap
      name: openshift-install-manifests
      namespace: openshift-config
      unmanaged: true
  status:
    availableUpdates: null
    capabilities:
      enabledCapabilities:
      - Build
      - CSISnapshot
      - CloudCredential
      - Console
      - DeploymentConfig
      - ImageRegistry
      - Insights
      - MachineAPI
      - NodeTuning
      - OperatorLifecycleManager
      - Storage
      - baremetal
      - marketplace
      - openshift-samples
      knownCapabilities:
      - Build
      - CSISnapshot
      - CloudCredential
      - Console
      - DeploymentConfig
      - ImageRegistry
      - Insights
      - MachineAPI
      - NodeTuning
      - OperatorLifecycleManager
      - Storage
      - baremetal
      - marketplace
      - openshift-samples
    conditions:
    - lastTransitionTime: "2024-05-25T16:07:30Z"
      message: 'Unable to retrieve available updates: currently reconciling cluster
        version 4.15.14 not found in the "stable-4.15" channel'
      reason: VersionNotFound
      status: "False"
      type: RetrievedUpdates
    - lastTransitionTime: "2024-05-25T16:07:30Z"
      message: Disabling ownership via cluster version overrides prevents upgrades.
        Please remove overrides before continuing.
      reason: ClusterVersionOverridesSet
      status: "False"
      type: Upgradeable
    - lastTransitionTime: "2024-05-25T16:07:30Z"
      message: Capabilities match configured spec
      reason: AsExpected
      status: "False"
      type: ImplicitlyEnabledCapabilities
    - lastTransitionTime: "2024-05-25T16:07:30Z"
      message: Payload loaded version="4.15.14" image="registry.local:5000/ocp4@sha256:234ccdfa4adabcfa7490785bad7108a3c7d622f19cd5b8f4b241dfba96c09be0"
        architecture="amd64"
      reason: PayloadLoaded
      status: "True"
      type: ReleaseAccepted
    - lastTransitionTime: "2024-05-25T16:07:30Z"
      status: "False"
      type: Available
    - lastTransitionTime: "2024-05-25T16:36:10Z"
      message: |-
        Multiple errors are preventing progress:
        * Cluster operators authentication, baremetal, cloud-controller-manager, cluster-autoscaler, config-operator, control-plane-machine-set, csi-snapshot-controller, dns, etcd, image-registry, ingress, insights, kube-apiserver, kube-controller-manager, kube-scheduler, kube-storage-version-migrator, machine-api, machine-approver, machine-config, marketplace, monitoring, network, node-tuning, openshift-apiserver, openshift-controller-manager, service-ca, storage are not available
        * Could not update imagestream "openshift/driver-toolkit" (607 of 873): resource may have been deleted
        * Could not update oauthclient "console" (546 of 873): the server does not recognize this resource, check extension API servers
        * Could not update role "openshift-apiserver/prometheus-k8s" (857 of 873): resource may have been deleted
        * Could not update role "openshift-authentication/prometheus-k8s" (753 of 873): resource may have been deleted
        * Could not update role "openshift-console-operator/prometheus-k8s" (791 of 873): resource may have been deleted
        * Could not update role "openshift-console/prometheus-k8s" (795 of 873): resource may have been deleted
        * Could not update role "openshift-controller-manager/prometheus-k8s" (865 of 873): resource may have been deleted
        * Could not update role "openshift/copied-csv-viewer" (675 of 873): resource may have been deleted
        * Could not update rolebinding "openshift/cluster-samples-operator-openshift-edit" (484 of 873): resource may have been deleted
      reason: MultipleErrors
      status: "True"
      type: Failing
    - lastTransitionTime: "2024-05-25T16:07:30Z"
      message: 'Unable to apply 4.15.14: an unknown error has occurred: MultipleErrors'
      reason: MultipleErrors
      status: "True"
      type: Progressing
    desired:
      image: registry.local:5000/ocp4@sha256:234ccdfa4adabcfa7490785bad7108a3c7d622f19cd5b8f4b241dfba96c09be0
      url: https://access.redhat.com/errata/RHSA-2024:2865
      version: 4.15.14
    history:
    - completionTime: null
      image: registry.local:5000/ocp4@sha256:234ccdfa4adabcfa7490785bad7108a3c7d622f19cd5b8f4b241dfba96c09be0
      startedTime: "2024-05-25T16:07:30Z"
      state: Partial
      verified: false
      version: 4.15.14
    observedGeneration: 1
    versionHash: PE2-EaXpK0k=
kind: List
metadata:
  resourceVersion: ""
 oc --kubeconfig=auth/kubeconfig get clusteroperator
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication
baremetal
cloud-controller-manager
cloud-credential                                     True        False         False      35m
cluster-autoscaler
config-operator
console
control-plane-machine-set
csi-snapshot-controller
dns
etcd
image-registry
ingress
insights
kube-apiserver
kube-controller-manager
kube-scheduler
kube-storage-version-migrator
machine-api
machine-approver
machine-config
marketplace
monitoring
network
node-tuning
openshift-apiserver
openshift-controller-manager
openshift-samples
operator-lifecycle-manager
operator-lifecycle-manager-catalog
operator-lifecycle-manager-packageserver
service-ca
storage

What you expected to happen?

installer to succeed

How to reproduce it (as minimally and precisely as possible)?

$ ./openshift-install create cluster

Anything else we need to know?

References

this is the closest i got to a reference https://access.redhat.com/solutions/4271572

patrickdillon commented 1 month ago

Ignition certificates are only valid for a short period. IIRC 24 hours. A common cause of this error is if ignition configs are generated well in advance of the install.

For further debugging we would need to inspect the certs. It would be good for us to update the troubleshooting docs on how to do this.

UriZafrir commented 1 month ago

Hi I didn't make the ignition config in advance. How can I debug the certificates?