openshift / installer

Install an OpenShift 4.x cluster
https://try.openshift.com
Apache License 2.0
1.42k stars 1.38k forks source link

DEBUG Still waiting for the Kubernetes API: Get https://mydomain.kz:6443/version?timeout=32s: EOF #2615

Closed Nurlan199206 closed 4 years ago

Nurlan199206 commented 4 years ago

I wanna build Openshift Container Platform cluster on bare metal. I am using GCP ComputeEngine for this.

RHEL 7 on VM instances...

i have: 1 bootstrap 3 masters 2 workers 1 LB for API (haproxy)

Version

4.2

$ openshift-install version
openshift-install v4.2.0
built from commit 90ccb37ac1f85ae811c50a29f9bb7e779c5045fb
release image quay.io/openshift-release-dev/ocp-release@sha256:c5337afd85b94c93ec513f21c8545e3f9e36a227f55d41bc1dfb8fcc3f2be129

Platform:

What happened?

DEBUG OpenShift Installer v4.2.0                   
DEBUG Built from commit 90ccb37ac1f85ae811c50a29f9bb7e779c5045fb 
INFO Waiting up to 30m0s for the Kubernetes API at https://api.ocp.sysadm.kz:6443... 
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp.sysadm.kz:6443/version?timeout=32s: EOF 
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp.sysadm.kz:6443/version?timeout=32s: EOF 
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp.sysadm.kz:6443/version?timeout=32s: EOF 
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp.sysadm.kz:6443/version?timeout=32s: EOF 
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp.sysadm.kz:6443/version?timeout=32s: EOF 
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp.sysadm.kz:6443/version?timeout=32s: EOF 
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp.sysadm.kz:6443/version?timeout=32s: EOF 
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp.sysadm.kz:6443/version?timeout=32s: EOF 
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp.sysadm.kz:6443/version?timeout=32s: EOF 
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp.sysadm.kz:6443/version?timeout=32s: EOF 
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp.sysadm.kz:6443/version?timeout=32s: EOF 
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp.sysadm.kz:6443/version?timeout=32s: EOF

Enter text here. See the troubleshooting documentation for ideas about what information to collect. For example, if the installer fails to create resources, attach the relevant portions of your .openshift_install.log.

What you expected to happen?

Openshift can't find API.... Enter text here.

How to reproduce it (as minimally and precisely as possible)?

./openshift-install wait-for bootstrap-complete --log-level debug

$ ./openshift-install wait-for bootstrap-complete --log-level debug

Anything else we need to know?

image my DNS. image image

my LB config

my LB config

listen stats
    bind :9000
    mode http
    stats enable
    stats uri /
    monitor-uri /healthz
frontend openshift-api-server
    bind 10.172.0.3:6443
    default_backend openshift-api-server
    mode tcp
    option tcplog
backend openshift-api-server
    balance source
    mode tcp
    server bootstrap 10.132.0.2:6443 check
    server master0 10.166.0.2:6443 check
    server master1 10.164.0.23:6443 check
    server master2 10.166.0.6:6443 check

frontend machine-config-server
    bind 10.172.0.3:22623
    default_backend machine-config-server
    mode tcp
    option tcplog
backend machine-config-server
    balance source
    mode tcp
    server bootstrap 10.132.0.2:22623 check
    server master0 10.166.0.2:22623 check
    server master1 10.164.0.23:22623 check
    server master2 10.166.0.6:22623 check

frontend ingress-http
    bind 10.172.0.3:80
    default_backend ingress-http
    mode tcp
    option tcplog
backend ingress-http
    balance source
    mode tcp
    server worker0 10.166.0.4:80 check
    server worker1 10.166.0.5:80 check

frontend ingress-https
    bind 10.172.0.3:443
    default_backend ingress-https
    mode tcp
    option tcplog
backend ingress-https
    balance source
    mode tcp
    server worker0 10.166.0.4:443 check
    server worker1 10.166.0.5:443 check

Enter text here.

References

Nurlan199206 commented 4 years ago

ANY HELP????

abhinavdahiya commented 4 years ago

Make sure you have the DNS, LB, conenctivity setup correctly based on https://docs.openshift.com/container-platform/4.2/installing/installing_bare_metal/installing-bare-metal.html#installation-network-user-infra_installing-bare-metal https://docs.openshift.com/container-platform/4.2/installing/installing_bare_metal/installing-bare-metal.html#installation-dns-user-infra_installing-bare-metal

Also, you can capture the failure logs by using

openshift-install gather bootstrap --bootstrap <bootstrap-host-ip> --master <control-plane-host-ip> [--master <control-plane-host-ip> ...]

which will provide us the necessary logs to debug the failure.

Nurlan199206 commented 4 years ago

@abhinavdahiya i need buy something from here? https://cloud.redhat.com/openshift/install/metal/user-provisioned for example pull secret? []

abhinavdahiya commented 4 years ago

@abhinavdahiya i need buy something from here? https://cloud.redhat.com/openshift/install/metal/user-provisioned for example pull secret? []

i'm not sure what you mean by buy something from here, you need the pullsecret so that you can pull container images for the redhat components.

redmark-redhat commented 4 years ago

I'm seeing the same error here, an solution?

fatal: [192.168.79.2]: FAILED! => {"changed": true, "cmd": "openshift-install --dir=pwd wait-for bootstrap-complete --log-level debug", "delta": "0:30:00.132730", "end": "2019-11-15 10:11:17.169260", "msg": "non-zero return code", "rc": 1, "start": "2019-11-15 09:41:17.036530", "stderr": "level=debug msg=\"OpenShift Installer unreleased-master-1805-g425e4ff0037487e32571258640b39f56d5ee5572\"\nlevel=debug msg=\"Built from commit 425e4ff0037487e32571258640b39f56d5ee5572\"\nlevel=info msg=\"Waiting up to 30m0s for the Kubernetes API at https://api.ocp-ppc64le-test-099bdc.redhat.com:6443...\"\nlevel=debug msg=\"Still waiting for the Kubernetes API: Get https://api.ocp-ppc64le-test-099bdc.redhat.com:6443/version?timeout=32s: x509: certificate signed by unknown authority (possibly because of \\"crypto/rsa: verification error\\" while trying to verify candidate authority certificate \\"kube-apiserver-lb-signer\\")\"

Also tried wget

wget https://api.ocp-ppc64le-test-099bdc.redhat.com:6443 --2019-11-15 10:27:53-- https://api.ocp-ppc64le-test-099bdc.redhat.com:6443/ Resolving api.ocp-ppc64le-test-099bdc.redhat.com (api.ocp-ppc64le-test-099bdc.redhat.com)... 192.168.122.168 Connecting to api.ocp-ppc64le-test-099bdc.redhat.com (api.ocp-ppc64le-test-099bdc.redhat.com)|192.168.122.168|:6443... connected. ERROR: The certificate of ‘api.ocp-ppc64le-test-099bdc.redhat.com’ is not trusted. ERROR: The certificate of ‘api.ocp-ppc64le-test-099bdc.redhat.com’ hasn't got a known issuer.

abhinavdahiya commented 4 years ago

@redmark-alt

I'm seeing the same error here, an solution?

it isn't the same error..

DEBUG Still waiting for the Kubernetes API: Get https://api.ocp.sysadm.kz:6443/version?timeout=32s: EOF 

vs yours

Still waiting for the Kubernetes API: Get https://api.ocp-ppc64le-test-099bdc.redhat.com:6443/version?timeout=32s: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kube-apiserver-lb-signer\")

1) is this the same platform as above ie GCP 2) how are you creating the cluster?

and are you using layer-4 LB and hopefully your LB is not doing the tls termination.

redmark-redhat commented 4 years ago

No, the platform is RHEL 8 with the OpenShift cluster configured in a KVM environment. We have a set of ansible playbooks configuring the cluster. This the command that fails

name: wait for bootstrap complete
  tags: config
  shell: openshift-install --dir=`pwd` wait-for bootstrap-complete --log-level debug
  args:
    chdir: "{{ workdir }}"
  retries: 1
  delay: 0

Yesterday the error message was a little different as seen here.

Still waiting for the Kubernetes API: Get https://api.ocp-ppc64le-test-099bdc.redhat.com:6443/version?timeout=32s: EOF\"\nlevel=debug msg=\"Still waiting for the Kubernetes API: Get https://api.ocp-ppc64le-test-099bdc.redhat.com:6443/version?timeout=32s: EOF\"\nlevel=debug

I don't remember making a change to any of the install playbooks. Let me run it again.

Nurlan199206 commented 4 years ago

@abhinandan13jan

./openshift-install gather bootstrap --bootstrap 10.132.0.2 --master ocp-master01.sysadm.kz
INFO Pulling debug logs from the bootstrap machine 
FATAL failed to create SSH client, ensure the proper ssh key is in your keyring or specify with --key: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain
Nurlan199206 commented 4 years ago

but SSH via ssh root@ocp-master01.sysadm.kz it works between bootstrap and master01 nodes..

Nurlan199206 commented 4 years ago

Still endless :6443/version?timeout=32s: EOF HELP!!!! LB,DNS settings correct!!!

Nurlan199206 commented 4 years ago

Openshift 4.x supports only RedHat CoreOS? becuase i'm using RHEL 7 for cluster.

ChrystianDuarte commented 4 years ago

Still endless :6443/version?timeout=32s: EOF HELP!!!! LB,DNS settings correct!!!

I have the same problem Any ideas?

jomeier commented 4 years ago

I had the same problem yesterday. I often create / delete VMs for tests.

Restart the load Balancer. In my case that helped.

abhinavdahiya commented 4 years ago

but SSH via ssh root@ocp-master01.sysadm.kz it works between bootstrap and master01 nodes..

Make sure you are using RHCOS for control-plane that's the only supported OS. and the user used by installer gather is core and not root.

if you specified the public SSH key during installation, the machines should already have that.

And as for the error. the only way we can help debug is if you provide the log bundle using openshift-install gather bootstrap --bootstrap <bootstrap-host-ip> --master <control-plane-0-ip> [--master <control-plane-$idx-ip>]

you can run openshift-install gather bootstrap --help for information on how to specify the SSH key, otherwise it tries to use an already running SSH agent..

whls commented 4 years ago

@abhinavdahiya I Ihave the some error: [root@api ocp4]# ./openshift-install wait-for bootstrap-complete --log-level debug DEBUG OpenShift Installer v4.2.1 DEBUG Built from commit e349157f325dba2d06666987603da39965be5319 INFO Waiting up to 30m0s for the Kubernetes API at https://api.ocp4.whls.com:6443... DEBUG Still waiting for the Kubernetes API: Get https://api.ocp4.whls.com:6443/version?timeout=32s: EOF DEBUG Still waiting for the Kubernetes API: Get https://api.ocp4.whls.com:6443/version?timeout=32s: EOF

I already collect logs with command: [root@api log]# /root/ocp4/openshift-install gather bootstrap --bootstrap bootstrap.ocp4.whls.com --master master0.ocp4.whls.com INFO Pulling debug logs from the bootstrap machine INFO Bootstrap gather logs captured here "log-bundle-20191219151525.tar.gz"

Could you please help to debug this problem? log-bundle-20191219151525.tar.gz

jomeier commented 4 years ago

Do you have a load balancer (HAProxy) before your Bootstrap and Master servers?

I also had this problem. A restart of the load balancer solved it.

Am 19.12.2019 um 08:33 schrieb whls notifications@github.com:

 @abhinavdahiya I Ihave the some error: [root@api ocp4]# ./openshift-install wait-for bootstrap-complete --log-level debug DEBUG OpenShift Installer v4.2.1 DEBUG Built from commit e349157 INFO Waiting up to 30m0s for the Kubernetes API at https://api.ocp4.whls.com:6443... DEBUG Still waiting for the Kubernetes API: Get https://api.ocp4.whls.com:6443/version?timeout=32s: EOF DEBUG Still waiting for the Kubernetes API: Get https://api.ocp4.whls.com:6443/version?timeout=32s: EOF

I already collect logs with command: [root@api log]# /root/ocp4/openshift-install gather bootstrap --bootstrap bootstrap.ocp4.whls.com --master master0.ocp4.whls.com INFO Pulling debug logs from the bootstrap machine INFO Bootstrap gather logs captured here "log-bundle-20191219151525.tar.gz"

Could you please help to debug this problem? log-bundle-20191219151525.tar.gz

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

whls commented 4 years ago

@jomeier Thanks for you reply. Yes, I have a haproxy server for LB. Here is my HAproxy server configuration

[root@api ocp4]# cat /etc/haproxy/haproxy.cfg

log         127.0.0.1 local2

chroot      /var/lib/haproxy
pidfile     /var/run/haproxy.pid
maxconn     4000
user        haproxy
group       haproxy
daemon

stats socket /var/lib/haproxy/stats

defaults
mode                    http
log                     global
option                  httplog
option                  dontlognull
option http-server-close

option                  redispatch
retries                 3
timeout http-request    10s
timeout queue           1m
timeout connect         10s
timeout client          1m
timeout server          1m
timeout http-keep-alive 10s
timeout check           10s
maxconn                 3000

listen stats
bind :9000
mode http
stats enable
stats uri /
monitor-uri /healthz

frontend openshift-api-server
bind *:6443
default_backend openshift-api-server
mode tcp
option tcplog

backend openshift-api-server
balance source
mode tcp
server bootstrap 9.98.30.45:6443 check
server master0 9.98.30.46:6443 check
server master1 9.98.30.47:6443 check
server master2 9.98.30.48:6443 check

frontend machine-config-server
bind *:22623
default_backend machine-config-server
mode tcp
option tcplog

backend machine-config-server
balance source
mode tcp
server bootstrap 9.98.30.45:22623 check
server master0 9.98.30.46:22623 check
server master1 9.98.30.47:22623 check
server master2 9.98.30.48:22623 check

frontend ingress-http
bind *:80
default_backend ingress-http
mode tcp
option tcplog

backend ingress-http
balance source
mode tcp
server worker0 9.98.30.54:80 check
server worker1 9.98.30.55:80 check
server worker2 9.98.30.56:80 check

frontend ingress-https
bind *:443
default_backend ingress-https
mode tcp
option tcplog

backend ingress-https
balance source
mode tcp
server worker0 9.98.30.54:443 check
server worker1 9.98.30.55:443 check
server worker2 9.98.30.56:443 check

The HAproxy service is running ,and the port is opening

[root@api ocp4]# netstat -tunlp |grep 80
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      5294/haproxy
udp        0      0 0.0.0.0:67              0.0.0.0:*                           7780/dnsmasq
[root@api ocp4]# netstat -tunlp |grep 443
tcp        0      0 0.0.0.0:6443            0.0.0.0:*               LISTEN      5294/haproxy
tcp        0      0 0.0.0.0:443             0.0.0.0:*               LISTEN      5294/haproxy
[root@api ocp4]# netstat -tunlp |grep 22623
tcp        0      0 0.0.0.0:22623           0.0.0.0:*               LISTEN      5294/haproxy
whls commented 4 years ago

Here is my DNS configuration

[root@ns1 ignition]# cat /var/named/data/whls.com.zone
$TTL 1W
@       IN      SOA     ns1.whls.com.   root (
                        2019070700      ; serial
                        3H              ; refresh (3 hours)
                        30M             ; retry (30 minutes)
                        2W              ; expiry (2 weeks)
                        1W )            ; minimum (1 week)
        IN      NS      ns1.whls.com.
        IN      MX 10   smtp.whls.com.
;
;
ns1     IN      A       9.98.30.44
smtp    IN      A       9.98.30.44
;
; The api points to the IP of your load balancer
api.ocp4                IN      A       9.98.30.59
api-int.ocp4            IN      A       9.98.30.59
;
; The wildcard also points to the load balancer
*.apps.ocp4             IN      A       9.98.30.59
;
; Create entry for the bootstrap host
bootstrap.ocp4  IN      A       9.98.30.45
;
; Create entries for the master hosts
master0.ocp4            IN      A       9.98.30.46
master1.ocp4            IN      A       9.98.30.47
master2.ocp4            IN      A       9.98.30.48
;
; Create entries for the worker hosts
worker0.ocp4            IN      A       9.98.30.54
worker1.ocp4            IN      A       9.98.30.55
worker2.ocp4            IN      A       9.98.30.56
;
; The ETCd cluster lives on the masters...so point these to the IP of the masters
etcd-0.ocp4     IN      A       9.98.30.46
etcd-1.ocp4     IN      A       9.98.30.47
etcd-2.ocp4     IN      A       9.98.30.48
;
; The SRV records are IMPORTANT....make sure you get these right...note the trailing dot at the end...
_etcd-server-ssl._tcp.ocp4.whls.com     IN      SRV     0 10 2380 etcd-0.ocp4.whls.com.
_etcd-server-ssl._tcp.ocp4.whls.com     IN      SRV     0 10 2380 etcd-1.ocp4.whls.com.
_etcd-server-ssl._tcp.ocp4.whls.com     IN      SRV     0 10 2380 etcd-2.ocp4.whls.com.
;
;EOF

[root@ns1 ignition]# cat /var/named/data/named.whls.zone
$TTL 1W
@       IN      SOA     ns1.whls.com.   root (
                        2019070700      ; serial
                        3H              ; refresh (3 hours)
                        30M             ; retry (30 minutes)
                        2W              ; expiry (2 weeks)
                        1W )            ; minimum (1 week)
        IN      NS      ns1.whls.com.
;
; syntax is "last octet" and the host must have fqdn with trailing dot
46      IN      PTR     master0.ocp4.whls.com.
47      IN      PTR     master1.ocp4.whls.com.
48      IN      PTR     master2.ocp4.whls.com.
;
45      IN      PTR     bootstrap.ocp4.whls.com.
;
59      IN      PTR     api.ocp4.whls.com.
59      IN      PTR     api-int.ocp4.whls.com.
;
54      IN      PTR     worker0.ocp4.whls.com.
55      IN      PTR     worker1.ocp4.whls.com.
56      IN      PTR     worker2.ocp4.whls.com.
;
;EOF
jomeier commented 4 years ago

Have you restarted HAProxy right after the bootstrap server has finished / after the control plane with the masters was ready?

abhinavdahiya commented 4 years ago

@abhinavdahiya I Ihave the some error: [root@api ocp4]# ./openshift-install wait-for bootstrap-complete --log-level debug DEBUG OpenShift Installer v4.2.1 DEBUG Built from commit e349157 INFO Waiting up to 30m0s for the Kubernetes API at https://api.ocp4.whls.com:6443... DEBUG Still waiting for the Kubernetes API: Get https://api.ocp4.whls.com:6443/version?timeout=32s: EOF DEBUG Still waiting for the Kubernetes API: Get https://api.ocp4.whls.com:6443/version?timeout=32s: EOF

I already collect logs with command: [root@api log]# /root/ocp4/openshift-install gather bootstrap --bootstrap bootstrap.ocp4.whls.com --master master0.ocp4.whls.com INFO Pulling debug logs from the bootstrap machine INFO Bootstrap gather logs captured here "log-bundle-20191219151525.tar.gz"

Could you please help to debug this problem? log-bundle-20191219151525.tar.gz

from bootstrap/journals/release-image.service

Dec 19 05:58:46 bootstrap.ocp4.whls.com release-image-download.sh[1602]: Error: error pulling image "quay.io/openshift-release-dev/ocp-release@sha256:dc782b44cac3d59101904cc5da2b9d8bdb90e55a07814df50ea7a13071b0f5f0": unable to pull quay.io/openshift-release-dev/ocp-release@sha256:dc782b44cac3d59101904cc5da2b9d8bdb90e55a07814df50ea7a13071b0f5f0: unable to pull image: Error initializing source docker://quay.io/openshift-release-dev/ocp-release@sha256:dc782b44cac3d59101904cc5da2b9d8bdb90e55a07814df50ea7a13071b0f5f0: pinging docker registry returned: Get https://quay.io/v2/: dial tcp: lookup quay.io on 9.98.30.44:53: server misbehaving

The bootstrap-host cannot connect to quay.io to download the release-image. That seems to be the cause for failure..

whls commented 4 years ago

@abhinavdahiya I Ihave the some error: [root@api ocp4]# ./openshift-install wait-for bootstrap-complete --log-level debug DEBUG OpenShift Installer v4.2.1 DEBUG Built from commit e349157 INFO Waiting up to 30m0s for the Kubernetes API at https://api.ocp4.whls.com:6443... DEBUG Still waiting for the Kubernetes API: Get https://api.ocp4.whls.com:6443/version?timeout=32s: EOF DEBUG Still waiting for the Kubernetes API: Get https://api.ocp4.whls.com:6443/version?timeout=32s: EOF I already collect logs with command: [root@api log]# /root/ocp4/openshift-install gather bootstrap --bootstrap bootstrap.ocp4.whls.com --master master0.ocp4.whls.com INFO Pulling debug logs from the bootstrap machine INFO Bootstrap gather logs captured here "log-bundle-20191219151525.tar.gz" Could you please help to debug this problem? log-bundle-20191219151525.tar.gz

from bootstrap/journals/release-image.service

Dec 19 05:58:46 bootstrap.ocp4.whls.com release-image-download.sh[1602]: Error: error pulling image "quay.io/openshift-release-dev/ocp-release@sha256:dc782b44cac3d59101904cc5da2b9d8bdb90e55a07814df50ea7a13071b0f5f0": unable to pull quay.io/openshift-release-dev/ocp-release@sha256:dc782b44cac3d59101904cc5da2b9d8bdb90e55a07814df50ea7a13071b0f5f0: unable to pull image: Error initializing source docker://quay.io/openshift-release-dev/ocp-release@sha256:dc782b44cac3d59101904cc5da2b9d8bdb90e55a07814df50ea7a13071b0f5f0: pinging docker registry returned: Get https://quay.io/v2/: dial tcp: lookup quay.io on 9.98.30.44:53: server misbehaving

The bootstrap-host cannot connect to quay.io to download the release-image. That seems to be the cause for failure..

Thanks for your help. Yes, I checked my DNS server. It can't be resolved quay.io. Must all nodes be able to access quay.io? include bootstrap, master and worker?

jomeier commented 4 years ago

Yes

Von meinem iPhone gesendet

Am 20.12.2019 um 03:53 schrieb whls notifications@github.com:

 @abhinavdahiya I Ihave the some error: [root@api ocp4]# ./openshift-install wait-for bootstrap-complete --log-level debug DEBUG OpenShift Installer v4.2.1 DEBUG Built from commit e349157 INFO Waiting up to 30m0s for the Kubernetes API at https://api.ocp4.whls.com:6443... DEBUG Still waiting for the Kubernetes API: Get https://api.ocp4.whls.com:6443/version?timeout=32s: EOF DEBUG Still waiting for the Kubernetes API: Get https://api.ocp4.whls.com:6443/version?timeout=32s: EOF I already collect logs with command: [root@api log]# /root/ocp4/openshift-install gather bootstrap --bootstrap bootstrap.ocp4.whls.com --master master0.ocp4.whls.com INFO Pulling debug logs from the bootstrap machine INFO Bootstrap gather logs captured here "log-bundle-20191219151525.tar.gz" Could you please help to debug this problem? log-bundle-20191219151525.tar.gz

from bootstrap/journals/release-image.service

Dec 19 05:58:46 bootstrap.ocp4.whls.com release-image-download.sh[1602]: Error: error pulling image "quay.io/openshift-release-dev/ocp-release@sha256:dc782b44cac3d59101904cc5da2b9d8bdb90e55a07814df50ea7a13071b0f5f0": unable to pull quay.io/openshift-release-dev/ocp-release@sha256:dc782b44cac3d59101904cc5da2b9d8bdb90e55a07814df50ea7a13071b0f5f0: unable to pull image: Error initializing source docker://quay.io/openshift-release-dev/ocp-release@sha256:dc782b44cac3d59101904cc5da2b9d8bdb90e55a07814df50ea7a13071b0f5f0: pinging docker registry returned: Get https://quay.io/v2/: dial tcp: lookup quay.io on 9.98.30.44:53: server misbehaving The bootstrap-host cannot connect to quay.io to download the release-image. That seems to be the cause for failure..

Thanks for your help. Yes, I checked my DNS server. It can't be resolved quay.io. Must all nodes be able to access quay.io? include bootstrap, master and worker?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

whls commented 4 years ago

@abhinavdahiya @jomeier Thanks for all your help! After setup DNS forward to public, I have completed the cluster installation. :) Another question:
I configuration 3 worker nodes for cluster, but after installation, only 2 worker nodes joined cluster, So whether only two work nodes can join automatically by default, If you want more work nodes, you need to join the cluster manually?

Nurlan199206 commented 4 years ago
openshift-install gather bootstrap --bootstrap 10.166.0.2 --master 10.132.0.2
INFO Pulling debug logs from the bootstrap machine 
FATAL failed to run remote command: Process exited with status 127 

Снимок экрана 2020-01-26 в 00 58 41

abhinavdahiya commented 4 years ago
openshift-install gather bootstrap --bootstrap 10.166.0.2 --master 10.132.0.2
INFO Pulling debug logs from the bootstrap machine 
FATAL failed to run remote command: Process exited with status 127 

Снимок экрана 2020-01-26 в 00 58 41

What Image are you using to boot your bootstrap, control plane and compute?

Dennys503 commented 4 years ago

I have the same problem: openshift-install wait-for bootstrap-complete --log-level debug 2020-01-22T17:22:24-06:00" level=debug msg="OpenShift Installer v4.2.13" level=debug msg="Built from commit 46f909e4ccb4f7a4f82bf1ee28b32fa011a6bd1f" level=info msg="Waiting up to 30m0s for the Kubernetes API at https://api.openshift.empresa.com:6443..." level=debug msg="Still waiting for the Kubernetes API: Get https://api.openshift.empresa.com:6443/version?timeout=32s: EOF" level=debug msg="Still waiting for the Kubernetes API: Get https://api.openshift.empresa.com:6443/version?timeout=32s: EOF"

openshift-install gather bootstrap --bootstrap bootstrap.openshift.empresa.com --master master.openshift.empresa.com INFO Pulling debug logs from the bootstrap machine FATAL failed to create SSH client, ensure the proper ssh key is in your keyring or specify with --key: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain

Dennys503 commented 4 years ago

@abhinavdahiya I Ihave the some error: [root@api ocp4]# ./openshift-install wait-for bootstrap-complete --log-level debug DEBUG OpenShift Installer v4.2.1 DEBUG Built from commit e349157 INFO Waiting up to 30m0s for the Kubernetes API at https://api.ocp4.whls.com:6443... DEBUG Still waiting for the Kubernetes API: Get https://api.ocp4.whls.com:6443/version?timeout=32s: EOF DEBUG Still waiting for the Kubernetes API: Get https://api.ocp4.whls.com:6443/version?timeout=32s: EOF I already collect logs with command: [root@api log]# /root/ocp4/openshift-install gather bootstrap --bootstrap bootstrap.ocp4.whls.com --master master0.ocp4.whls.com INFO Pulling debug logs from the bootstrap machine INFO Bootstrap gather logs captured here "log-bundle-20191219151525.tar.gz" Could you please help to debug this problem? log-bundle-20191219151525.tar.gz

from bootstrap/journals/release-image.service

Dec 19 05:58:46 bootstrap.ocp4.whls.com release-image-download.sh[1602]: Error: error pulling image "quay.io/openshift-release-dev/ocp-release@sha256:dc782b44cac3d59101904cc5da2b9d8bdb90e55a07814df50ea7a13071b0f5f0": unable to pull quay.io/openshift-release-dev/ocp-release@sha256:dc782b44cac3d59101904cc5da2b9d8bdb90e55a07814df50ea7a13071b0f5f0: unable to pull image: Error initializing source docker://quay.io/openshift-release-dev/ocp-release@sha256:dc782b44cac3d59101904cc5da2b9d8bdb90e55a07814df50ea7a13071b0f5f0: pinging docker registry returned: Get https://quay.io/v2/: dial tcp: lookup quay.io on 9.98.30.44:53: server misbehaving

The bootstrap-host cannot connect to quay.io to download the release-image. That seems to be the cause for failure..

Thanks for your help. Yes, I checked my DNS server. It can't be resolved quay.io. Must all nodes be able to access quay.io? include bootstrap, master and worker?

how did you test your dns connectivity with quay.io

Nurlan199206 commented 4 years ago

how to bypass this? i'm stuck on endless unable to get REST mapping for log-bundle-20200201134119.tar.gz

Feb 01 18:41:21 localhost bootkube.sh[6878]: "99_openshift-machineconfig_99-worker-ssh.yaml": unable to get REST mapping for "99_openshift-machineconfig_99-worker-ssh.yaml": no matches for kind "MachineConfig" in version "machineconfiguration.openshift.io/v1" Feb 01 18:41:21 localhost bootkube.sh[6878]: [#2652] failed to create some manifests: Feb 01 18:41:21 localhost bootkube.sh[6878]: "99_openshift-machineconfig_99-master-ssh.yaml": unable to get REST mapping for "99_openshift-machineconfig_99-master-ssh.yaml": no matches for kind "MachineConfig" in version "machineconfiguration.openshift.io/v1" Feb 01 18:41:21 localhost bootkube.sh[6878]: "99_openshift-machineconfig_99-worker-ssh.yaml": unable to get REST mapping for "99_openshift-machineconfig_99-worker-ssh.yaml": no matches for kind "MachineConfig" in version "machineconfiguration.openshift.io/v1" Feb 01 18:41:21 localhost bootkube.sh[6878]: [#2653] failed to create some manifests: Feb 01 18:41:21 localhost bootkube.sh[6878]: "99_openshift-machineconfig_99-master-ssh.yaml": unable to get REST mapping for "99_openshift-machineconfig_99-master-ssh.yaml": no matches for kind "MachineConfig" in version "machineconfiguration.openshift.io/v1" Feb 01 18:41:21 localhost bootkube.sh[6878]: "99_openshift-machineconfig_99-worker-ssh.yaml": unable to get REST mapping for "99_openshift-machineconfig_99-worker-ssh.yaml": no matches for kind "MachineConfig" in version "machineconfiguration.openshift.io/v1" Feb 01 18:41:21 localhost bootkube.sh[6878]: [#2654] failed to create some manifests: Feb 01 18:41:21 localhost bootkube.sh[6878]: "99_openshift-machineconfig_99-master-ssh.yaml": unable to get REST mapping for "99_openshift-machineconfig_99-master-ssh.yaml": no matches for kind "MachineConfig" in version "machineconfiguration.openshift.io/v1" Feb 01 18:41:21 localhost bootkube.sh[6878]: "99_openshift-machineconfig_99-worker-ssh.yaml": unable to get REST mapping for "99_openshift-machineconfig_99-worker-ssh.yaml": no matches for kind "MachineConfig" in version "machineconfiguration.openshift.io/v1" Feb 01 18:41:22 localhost bootkube.sh[6878]: [#2655] failed to create some manifests: Feb 01 18:41:22 localhost bootkube.sh[6878]: "99_openshift-machineconfig_99-master-ssh.yaml": unable to get REST mapping for "99_openshift-machineconfig_99-master-ssh.yaml": no matches for kind "MachineConfig" in version "machineconfiguration.openshift.io/v1" Feb 01 18:41:22 localhost bootkube.sh[6878]: "99_openshift-machineconfig_99-worker-ssh.yaml": unable to get REST mapping for "99_openshift-machineconfig_99-worker-ssh.yaml": no matches for kind "MachineConfig" in version "machineconfiguration.openshift.io/v1" photo_2020-02-02 00 46 14 photo_2020-02-02 00 46 20 photo_2020-02-02 00 46 25

vrutkovs commented 4 years ago

CVO doesn't have a place to run:

I0201 18:41:21.143569       1 apps.go:115] Deployment cluster-version-operator is not ready. status: (replicas: 1, updated: 1, ready: 0, unavailable: 1, reason: MinimumReplicasUnavailable, message: Deployment does not have minimum availability.)

log bundle contains only one master, which is not sufficient for install. You'd need 3 masters + 2 workers, see https://docs.openshift.com/container-platform/4.2/installing/installing_bare_metal/installing-bare-metal.html#machine-requirements_installing-bare-metal

/close

openshift-ci-robot commented 4 years ago

@vrutkovs: Closing this issue.

In response to [this](https://github.com/openshift/installer/issues/2615#issuecomment-581059330): >CVO doesn't have a place to run: > >``` >I0201 18:41:21.143569 1 apps.go:115] Deployment cluster-version-operator is not ready. status: (replicas: 1, updated: 1, ready: 0, unavailable: 1, reason: MinimumReplicasUnavailable, message: Deployment does not have minimum availability.) >``` > >log bundle contains only one master, which is not sufficient for install. You'd need 3 masters + 2 workers, see https://docs.openshift.com/container-platform/4.2/installing/installing_bare_metal/installing-bare-metal.html#machine-requirements_installing-bare-metal > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
milan-dikkumburage commented 4 years ago

Hi @Nurlan199206 are you able to fix the issue. What are steps you take to resolve the issue ?

I'm getting slimier error image

[core@okd4-services ~]$ openshift-install gather bootstrap --dir=install_dir/ --bootstrap xxx.xxx.xxx.xxx --master xxx.xxx.xxx.xxx INFO Pulling debug logs from the bootstrap machine FATAL failed to run remote command: Process exited with status 127

josephsadek commented 4 years ago

@abhinavdahiya @jomeier Thanks for all your help! After setup DNS forward to public, I have completed the cluster installation. :) Another question: I configuration 3 worker nodes for cluster, but after installation, only 2 worker nodes joined cluster, So whether only two work nodes can join automatically by default, If you want more work nodes, you need to join the cluster manually?

can you show my how to configure DNS forward to public

sheetalp304 commented 3 years ago

@abhinavdahiya @jomeier Thanks for all your help! After setup DNS forward to public, I have completed the cluster installation. :) Another question: I configuration 3 worker nodes for cluster, but after installation, only 2 worker nodes joined cluster, So whether only two work nodes can join automatically by default, If you want more work nodes, you need to join the cluster manually?

I am facing the same issue, not able to resolve quay.in Can you provide the steps to set DNS forward to public which worked in your case?

ablaabiyad commented 3 years ago

Still endless :6443/version?timeout=32s: EOF HELP!!!! LB,DNS settings correct!!!

I have the same issue on Virtualbox, if you managed to correct this, would you please share a hint?

Nurlan199206 commented 3 years ago

@ablaabiyad check this:

https://github.com/Nurlan199206/okd4/blob/master/local

https://github.com/Nurlan199206/okd4/blob/master/haproxy.cfg

ablaabiyad commented 3 years ago

@ablaabiyad check this:

https://github.com/Nurlan199206/okd4/blob/master/local

https://github.com/Nurlan199206/okd4/blob/master/haproxy.cfg

Still have the same issue using your haproxy and I cannot even retrieve logs even I can access ssh with root and core to the bootstrap machine. FATAL failed to create SSH client: failed to use the provided keys for authentication: ssh: handshake failed: ssh: unable to authenticate,