redhat-cop / ocp4-helpernode

This playbook helps set up an "all-in-one" node, that has all the infrastructure/services in order to install OpenShift 4.
342 stars 303 forks source link

Master can't connect to bootstrap #228

Closed YiannisGkoufas closed 2 years ago

YiannisGkoufas commented 3 years ago

Hi, I followed the guide for static IPs. I managed to install the bootstrap node without a problem, also I could see on the dashboard that is working. However when I try to install the first master it fails and I can see on the logs:

[    **] A start job is running for Ignition (fetch) (29min 1s / no limit)[ 1748.123077] ignition[2523]: GET https://api-int.ocp4.foc.ibm:22623/config/master: attempt #347
[ 1748.123391] ignition[2523]: GET error: Get "https://api-int.ocp4.foc.ibm:22623/config/master": dial tcp: lookup api-int.ocp4.foc.ibm on [::1]:53: read udp [::1]:45552->[::1]:53: read: connection refused
[***   ] A start job is running for Ignition (fetch) (29min 6s / no limit)[ 1753.123652] ignition[2523]: GET https://api-int.ocp4.foc.ibm:22623/config/master: attempt #348
[ 1753.124012] ignition[2523]: GET error: Get "https://api-int.ocp4.foc.ibm:22623/config/master": dial tcp: lookup api-int.ocp4.foc.ibm on [::1]:53: read udp [::1]:38200->[::1]:53: read: connection refused
[***   ] A start job is running for Ignition (fetch) (29min 11s / no limit)[ 1758.124276] ignition[2523]: GET https://api-int.ocp4.foc.ibm:22623/config/master: attempt #349
[ 1758.124665] ignition[2523]: GET error: Get "https://api-int.ocp4.foc.ibm:22623/config/master": dial tcp: lookup api-int.ocp4.foc.ibm on [::1]:53: read udp [::1]:37298->[::1]:53: read: connection refused

Clearly I have done something wrong in the configuration, but can you give me any tip about how to troubleshoot that? If I understand correctly the dns server should point api-int.ocp4.foc.ibm to the helper node which has the haproxy installed which would then redirect the request to the bootstrap node. Is this correct?

christianh814 commented 3 years ago

On startup, the masters will connect to that HA Proxy instance to grab it's ignition configuration. When the bootstrap finishes setting itself up, it'll become an ignition server.

What may be happening is that the bootstrap server either was still setting itself up (check slow disks and slow internet connection) or failed for whatever reason.

SSH into the bootstrap node and run some of the commands (journalctl) it displays on the MOTD. Looking at bootstrap may give you more info.

YiannisGkoufas commented 3 years ago

Hi @christianh814 thanks for the answer! I managed to figure out what was the problem. When installing the master besides the helper node I was adding as DNS server 8.8.8.8 which was causing this problem. Now I am not seeing that error.

One thing I am seeing and I was wondering if I can get your input. I have installed bootstrap, master0, master1 and I am about to install master2.

The config servers are all up:

image

However the api-servers come up for a bit:

image

And then they die:

image

Is this expected? Or I should be seeing the masters becoming green and staying green?

christianh814 commented 3 years ago

They'll come in an out. All nodes in the cluster will perform an OS Upgrade (if one is found) during installation. So they may "flap" for a but.

They should eventually (around 15 min or so) stay green.