openshift / installer

Install an OpenShift 4.x cluster
https://try.openshift.com
Apache License 2.0
1.42k stars 1.38k forks source link

openshift-install create cluster | Master Node Error Message: get .../config/master tls: failed to verify certificate: x509 | proxy issue ? #8304

Open n00bsi opened 4 months ago

n00bsi commented 4 months ago

Version

4.15.8

$ openshift-install version
built from commit f4f5d0ee0f7591fd9ddf03ac337c804608102919
release image quay.io/openshift-release-dev/ocp-release@sha256:5c82cea5931996af509231c7a5a1611bcfd927dca4e362e6443d1f8a77a517c2
release architecture amd64

Platform: vSphere

What happened?

Tryed to install OpenShift Cluster on vSphere 7 The VMs are created but write error message on the Console of MasterNodes.

The VMs are behing a HTTP Proxy

Screenshot from 2024-04-23 09-11-35

Bootstrap Node show:

Screenshot from 2024-04-23 10-07-42

$ curl -kv https://quay.io/openshift-release-dev/ocp-release@sha256:aba54b293dc151f5c0fd96d4353ced6ced3e7da6620c1c10714ab32d0577486f* Could not resolve host: quay.io
* Closing connection 0
curl: (6) Could not resolve host: quay.io

when add manual the proxy parameter curl downloads

When open the URL got the config file but have to accept the unknown SSL Cert

See the troubleshooting documentation for ideas about what information to collect. For example, if the installer fails to create resources, attach the relevant portions of your .openshift_install.log.

openshift_install.log

What you expected to happen?

that the installer runs well and the Cluster come up

How to reproduce it (as minimally and precisely as possible)?

wget --no-check-certificate vCenter.yourdomain.tld/certs/download.zip

unzip download.zip

su - root

cp certs/lin/* /etc/pki/ca-trust/source/anchors

update-ca-trust extract

export no_proxy=.ourdomain.tld
export https_proxy=http://username:password@proxy.ourdomain.tld:3128

./openshift-install create cluster

[osadmin@osdemo ~]$ ./openshift-install create cluster
? SSH Public Key /home/osadmin/.ssh/id_rsa.pub
? Platform vsphere
? vCenter vc01.ourdomain.tld
? Username username
? Password [? for help] **************
INFO Connecting to vCenter  vc01.ourdomain.tld   
? Datacenter Datacenter
? Cluster /Datacenter/host/Cluster7
? Default Datastore /Datacenter/datastore/a300_lun45
? Network VLAN_0
? Virtual IP Address for API     192.168.1.227 ( api.osctest.ourdomain.tld )
? Virtual IP Address for Ingress 192.168.1.228  ( console-openshift-console.apps.osctest.ourdomain.tld )   
? Base Domain ourdomain.tld
? Cluster Name osctest
? Pull Secret [? for help] *****************************************************************************************************************************************

NFO Creating infrastructure resources...         
INFO Waiting up to 20m0s (until 5:21AM EDT) for the Kubernetes API at https://api.osctest.ourdomain.tld:6443... 
WARNING Failed to extract host addresses: could not extract IP with bootstrap MOID:  
INFO Skipping VM console logs gather: no gather methods registered for "vsphere" 
INFO Pulling debug logs from the bootstrap machine 
WARNING Unable to stat /home/osadmin/serial-log-bundle-20240423052123.tar.gz, skipping 
ERROR Attempted to gather ClusterOperator status after installation failure: listing ClusterOperator objects: Get "https://api.osctest.ourdomain.tld:6443/apis/config.openshift.io/v1/clusteroperators": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kube-apiserver-lb-signer") 
ERROR Bootstrap failed to complete: Get "https://api.osctest.ourdomain.tld:6443/version": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kube-apiserver-lb-signer") 
ERROR Failed waiting for Kubernetes API. This error usually happens when there is a problem on the bootstrap host that prevents creating a temporary control plane. 
ERROR The bootstrap machine failed to download the release image 
INFO Pull failed. Retrying quay.io/openshift-release-dev/ocp-release@sha256:5c82cea5931996af509231c7a5a1611bcfd927dca4e362e6443d1f8a77a517c2... 
INFO Error: initializing source docker://quay.io/openshift-release-dev/ocp-release@sha256:5c82cea5931996af509231c7a5a1611bcfd927dca4e362e6443d1f8a77a517c2: pinging container registry quay.io: Get "https://quay.io/v2/": dial tcp: lookup quay.io on 192.168.21.32:53: no such host 
INFO 2024-04-23 09:21:24.081569673 +0000 UTC m=+0.018239684 image pull  quay.io/openshift-release-dev/ocp-release@sha256:5c82cea5931996af509231c7a5a1611bcfd927dca4e362e6443d1f8a77a517c2 
INFO Bootstrap gather logs captured here "/home/osadmin/log-bundle-20240423052123.tar.gz" 

How to setup CIDR and CNI Hostnames, NTP, ..... ?

n00bsi commented 4 months ago

Found a solution for proxy:

platform: ... ... proxy: httpsProxy: http://username:password@proxy.ourdomain.tld:3128 httpProxy: http://username:password@proxy.ourdomain.tld:3128 noProxy: .ourdomain.tld,10.199.0.0/20,10.199.16.0/24

n00bsi commented 4 months ago

but the verify error is still there

W0424 01:39:49.282000   26286 reflector.go:535] k8s.io/client-go/tools/watch/informerwatcher.go:146: failed to list *v1.ConfigMap: Get "https://api.osctest.ourdomain.tld:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dbootstrap&resourceVersion=3229": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kube-apiserver-lb-signer")
E0424 01:39:49.282124   26286 reflector.go:147] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: Get "https://api.osctest.ourdomain.tld:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dbootstrap&resourceVersion=3229": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kube-apiserver-lb-signer")
W0424 01:40:40.034563   26286 reflector.go:535] k8s.io/client-go/tools/watch/informerwatcher.go:146: failed to list *v1.ConfigMap: Get "https://api.osctest.ourdomain.tld:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dbootstrap&resourceVersion=3229": dial tcp 192.168.1.227:6443: connect: connection refused
E0424 01:40:40.034672   26286 reflector.go:147] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: Get "https://api.osctest.ourdomain.tld:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dbootstrap&resourceVersion=3229": dial tcp 192.168.1.227:6443: connect: connection refused
openshift-bot commented 1 month ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot commented 2 weeks ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale