kxr / ocp4_setup_upi_kvm

Script to Setup an OpenShift 4 UPI Cluster on KVM. Based on this guide: https://kxr.me/2019/08/17/openshift-4-upi-install-libvirt-kvm/
52 stars 57 forks source link

script not working Rhel 8.4 #28

Closed jma1975 closed 3 years ago

jma1975 commented 3 years ago

script not working Rhel 8.4

`[root@localhost ocp4_setup_upi_kvm]# ./ocp4_setup_upi_kvm.sh --ocp-version 4.2.latest --cluster-name labocp --cluster-domain onesait.platform.com --pull-secret /home/pull-secret

####################################

DEPENDENCIES & SANITY CHECKS

####################################

====> Checking if we have all the dependencies: ok ====> Checking if the script/working directory already exists: ok ====> Checking for pull-secret (/home/pull-secret): ok ====> Checking if libvirt is running or enabled: ok ====> Checking if we have any existing leftover VMs: ok ====> Checking if DNS service (dnsmasq or NetworkManager) is active: NetworkManager ====> Checking if dnsmasq is enabled in NetworkManager: ok ====> Testing dnsmasq reload (systemctl reload NetworkManager): ok ====> Testing libvirtd restart (systemctl restart libvirtd): ok ====> Checking for any leftover dnsmasq config: ok ====> Checking for any leftover hosts file: ok ====> Checking for any leftover/conflicting dns records: ok

#######################

LIBVIRT NETWORK

#######################

====> Checking libvirt network: using default

##################

DNS CHECK

##################

====> Checking if first entry in /etc/resolv.conf is pointing locally: ok ====> Creating a test host file for dnsmasq /etc/hosts.dnstest: ok ====> Creating a test dnsmasq config file /etc/NetworkManager/dnsmasq.d/dnstest.conf: ok ====> Reloading libvirt and dnsmasq: .. ok

====> Testing forward dns via @127.0.0.1: ok ====> Testing reverse dns via @127.0.0.1: ok ====> Testing wildcard record via @127.0.0.1: ok

====> Testing forward dns via @127.0.0.1: ok ====> Testing reverse dns via @127.0.0.1: ok ====> Testing wildcard record via @127.0.0.1: ok

====> Testing forward dns via @192.168.1.103: [root@localhost ocp4_setup_upi_kvm]# `

kxr commented 3 years ago

Hello, thank you for reporting this issue. I see two problems here.

First, there is a bug in how script abruptly exits instead of showing a descriptive message. That is on me, I will fix that. What should have happened is that the script should have errored out saying "One or more DNS tests failed"

However the reason script failed was because it didn't find the dns on the host to be correctly setup. The expectation is that when we add a dns record in dnsmasq on the host, that should be visible in the libvirt network. If this doesn't work the cluster installation will painfully fail later. Hence the script is design to fail early using these tests.

I am not sure why the dns is checking dns via 192.168.1.103, it doesn't sound like a libvirt IP. I would expect 192.168.122.1. May be you are on an older commit (I remember in older revisions I was testing various other IPs). Can you check that? Can you make sure you have the latest commit (git pull) and try again?

jma1975 commented 3 years ago

Hello, The branch is up to date. The ip is not from libvirt, it is from my network interface. [root@localhost ~]# ifconfig eno1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.1.103 netmask 255.255.255.0 broadcast 192.168.1.255 inet6 fe80::9af2:b3ff:feef:efc prefixlen 64 scopeid 0x20 ether 98:f2:b3:ef:0e:fc txqueuelen 1000 (Ethernet) RX packets 84971 bytes 16403939 (15.6 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 23953 bytes 20199036 (19.2 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device interrupt 16

jma1975 commented 3 years ago

[root@localhost ocp4_setup_upi_kvm]# cat /etc/resolv.conf

Generated by NetworkManager

nameserver 127.0.0.1 options edns0 trust-ad

kxr commented 3 years ago

Ok, interesting. So the dns test loop should iterate over ${first_ns} and ${LIBVIRT_GWIP}. In your output it iterated twice over 127.0.0.1 and the failed on 192.168.1.103. That doesn't add up.

Can you try running these two lines manually on the shell and see what values it pickup? (replace ${VIR_NET} with default)

jma1975 commented 3 years ago

This is the result:

[root@localhost ocp4_setup_upi_kvm]# ./ocp4_setup_upi_kvm.sh --ocp-version 4.6.latest --cluster-name labocp --pull-secret /home/pull-secret --vm-dir /var/lib/libvirt/images/

####################################

DEPENDENCIES & SANITY CHECKS

####################################

====> Checking if we have all the dependencies: ok ====> Checking if the script/working directory already exists: ok ====> Checking for pull-secret (/home/pull-secret): ok ====> Checking if libvirt is running or enabled: ok ====> Checking if we have any existing leftover VMs: ok ====> Checking if DNS service (dnsmasq or NetworkManager) is active: NetworkManager ====> Checking if dnsmasq is enabled in NetworkManager:

[ERROR] DNS Directory is set to NetworkManager but dnsmasq is not enabled in NetworkManager

See: https://github.com/kxr/ocp4_setup_upi_kvm/wiki/Setting-Up-DNS
kxr commented 3 years ago

What changed? Previous output showed:

====> Checking if DNS service (dnsmasq or NetworkManager) is active: NetworkManager
====> Checking if dnsmasq is enabled in NetworkManager: ok

And now its failing. Were you on a old commit?

jma1975 commented 3 years ago

just run: export LIBVIRT_BRIDGE = $ (virsh net-info default | grep "^ Bridge:" | awk '{print $ 2}') export LIBVIRT_GWIP = $ (ip -f inet addr show $ {LIBVIRT_BRIDGE} | awk '/ inet / {print $ 2}' | cut -d '/' -f1)

jma1975 commented 3 years ago

I deleted the repository and clear cache, now I get the same error as at the beginning: [root@localhost ocp4_setup_upi_kvm]# ./ocp4_setup_upi_kvm.sh --ocp-version 4.6.latest --cluster-name lab --pull-secret /home/pull-secret --vm-dir /var/lib/libvirt/images/

####################################

DEPENDENCIES & SANITY CHECKS

####################################

====> Checking if we have all the dependencies: ok ====> Checking if the script/working directory already exists: ok ====> Checking for pull-secret (/home/pull-secret): ok ====> Checking if libvirt is running or enabled: ok ====> Checking if we have any existing leftover VMs: ok ====> Checking if DNS service (dnsmasq or NetworkManager) is active: NetworkManager ====> Checking if dnsmasq is enabled in NetworkManager: ok ====> Testing dnsmasq reload (systemctl reload NetworkManager): ok ====> Testing libvirtd restart (systemctl restart libvirtd): ok ====> Checking for any leftover dnsmasq config: ok ====> Checking for any leftover hosts file: ok ====> Checking for any leftover/conflicting dns records: ok

#######################

LIBVIRT NETWORK

#######################

====> Checking libvirt network: using default

##################

DNS CHECK

##################

====> Checking if first entry in /etc/resolv.conf is pointing locally: ok ====> Creating a test host file for dnsmasq /etc/hosts.dnstest: ok ====> Creating a test dnsmasq config file /etc/NetworkManager/dnsmasq.d/dnstest.conf: ok ====> Reloading libvirt and dnsmasq: .. ok

====> Testing forward dns via @127.0.0.1: ok ====> Testing reverse dns via @127.0.0.1: ok ====> Testing wildcard record via @127.0.0.1: ok

====> Testing forward dns via @127.0.0.1: ok ====> Testing reverse dns via @127.0.0.1: ok ====> Testing wildcard record via @127.0.0.1: ok

====> Testing forward dns via @192.168.1.103: [root@localhost ocp4_setup_upi_kvm]#

kxr commented 3 years ago

Run:

export LIBVIRT_BRIDGE = $ (virsh net-info default | grep "^ Bridge:" | awk '{print $ 2}')
export LIBVIRT_GWIP = $ (ip -f inet addr show $ {LIBVIRT_BRIDGE} | awk '/ inet / {print $ 2}' | cut -d '/' -f1)

And the see what values it picks:

echo $LIBVIRT_BRIDGE
echo $LIBVIRT_GWIP
jma1975 commented 3 years ago

Out:

[root@localhost /]# export LIBVIRT_BRIDGE=$(virsh net-info default | grep "^Bridge:" | awk '{print $2}') [root@localhost /]# export LIBVIRT_GWIP=$(ip -f inet addr show ${LIBVIRT_BRIDGE} | awk '/inet / {print $2}' | cut -d '/' -f1) [root@localhost /]# echo $LIBVIRT_BRIDGE

[root@localhost /]# echo $LIBVIRT_GWIP 127.0.0.1 192.168.1.103 [root@localhost /]#

kxr commented 3 years ago

Ah ok, this explains a lot. So basically LIBVIRT_BRIDGE is not being picked up correctly. Can you show me the output of virsh net-info default.

jma1975 commented 3 years ago

[root@localhost /]# virsh net-info default Nombre: default UUID: f1766c90-d3cc-4c68-8842-bf7d55079d24 Activar: no Persistente: si Autoinicio: no Puente: virbr0

kxr commented 3 years ago

Oh, so it is printing output in a different language. The script expects English. Basically the script is expect "Bridge:" and the output being in another language gives "Puente:". Is it possible for you to switch to English?

kxr commented 3 years ago

For example:

#> virsh net-info default
Name:           default
UUID:           ab74e469-e3c5-4f9e-a95e-0a9aa5b16c65
Active:         yes
Persistent:     yes
Autostart:      yes
Bridge:         virbr0
jma1975 commented 3 years ago

Changed language to English

[root@localhost ~]# export LIBVIRT_BRIDGE = $ (virsh net-info default | grep "^ Bridge:" | awk '{print $ 2}') -bash: syntax error near unexpected token `(' [root@localhost ~]# export LIBVIRT_BRIDGE=$(virsh net-info default | grep "^Bridge:" | awk '{print $2}') [root@localhost ~]# export LIBVIRT_GWIP=$(ip -f inet addr show ${LIBVIRT_BRIDGE} | awk '/inet / {print $2}' | cut -d '/' -f1) [root@localhost ~]# echo $LIBVIRT_BRIDGE virbr0 [root@localhost ~]# echo $LIBVIRT_GWIP 192.168.122.1 [root@localhost ~]# virsh net-info default Name: default UUID: f1766c90-d3cc-4c68-8842-bf7d55079d24 Active: yes Persistent: yes Autostart: no Bridge: virbr0

kxr commented 3 years ago

Looks good. Run the script now.

jma1975 commented 3 years ago

When changing the language to English, the script works perfectly. Thank you very much for this great contribution to the community. Excellent work.

kxr commented 3 years ago

Thank you for the feedback. I am glad you found it useful :)