Closed kai-uwe-rommel closed 3 years ago
Dupe of #536
@vrutkovs, I am pretty sure we are talking about two different issues!
Lets figure out #536 in any case
But ... so far I was able to work my way around the #536 problem so far so I would be able to deploy an OKD 4.7 cluster if someone would request it from me, although I would currently rather stay with OKD 4.6.
However, the broken static IP config is a showstopper. I do not have found the reason yet and have no workaround.
/reopen
@kai-uwe-rommel: Reopened this issue.
Just for the record, again: the static IP configuration works fine initialially, when the VMs are created, ignited and come up first. But it breaks immediately after the rebase+reboot that happens when OKD deployment starts.
Hi,
your issue sounds interesting, could you explain it in more detail, please?
I was able to set up the 4.7 2021-02-25 with static Network configuration without any problems either, and could upgrade to 03-07 seamlessly. I am installing on VMware UPI in bare-metal mode using prepared F.CoreOS ISOs or PXE.
I also use the more stable OVNKubernetes CNI and not OpenShifSDN...
To configure static networking I am appending kernel parameters to each machine in the bootloader during the initial provisioning boot of the machines:
rd.neednet=1 ip=NODE_IP::GATEWAY_IP:NETMASK:NODE_FQDN_HOSTNAME:ens192:none nameserver=DNS_SERVER_1 nameserver=DNS_SERVER_2
Would be nice to keep your problems in mind for our automated cluster deployments. Do you have any logs or can say what was exactly breaking and how your setup differs from ours?
Why are you doing this so complicated if you could just append kernel params on initial boot for static network configuration: "That means, I used AfterBurn for the initial boot IP configuration through vSphere VM advanced config settings and through ignition created a NetworkManager config file as well as /etc/hostname and /etc/hosts files."
Thanks for reporting your problems!
Hello @devzeronull , thanks for your reply. Actually, it is not "so complicated". Afterburn ist also not doing anything else than appending kernel parameters. I was also using modified boot images before, until I discovered that it is much easier using Afterburn and that it relieves me from modifying the boot images: https://docs.openshift.com/container-platform/4.6/release_notes/ocp-4-6-release-notes.html#ocp-4-6-static-ip-config-with-ova That link came from here where it is described in more detail: https://www.openshift.com/blog/how-to-install-openshift-4.6-on-vmware-with-upi This works fine in RHCOS with 4.6 and newer as well as in FCOS since later version 32 builds.
Now what I am not sure about yet: does the kernel parameter string for IP config only work during initial (ignition) boot or does it always work afterwards, too? I guess it will continue to work later, too, but it is a bit limited. You can for example not specify DNS search suffixes. That's why I during ignition simply create a config file in /etc/NetworkManager/system-connections for the ens192 interface (this is very easy). Also, I set the hostname statically in /etc/hostname because some previous FCOS release had broken this. That's all there is to it. It is easy to do and easily integrated (automated) into ignition and deployment scripts.
The problem I now have with OKD 4.7 on FCOS33, at least with the 2021-03-07 build of OKD: After the FCOS rebase and reboot which is the first step in the OKD deployment on the FCOS nodes, my static IPs are simply ignored. That is, the VM comes up but does not have network connectivity. I can log in via VM console but there is no IP config set. I was able to set up OKD 4.7 with my way of configuring static IPs with the 2021-02-25 release.
It may perhaps have to do something with a change in OKD 4.7's overlay networking (of course I use OVNKubernetes). I have looked at an OKD cluster which is configured with DHCP and which I upgraded to OKD 4.7. There I saw that the DHCP-assigned IP is no longer set on "ens192" but magically moved to a bridge interface "br-ex". I have not seen anything documented about this change yet. Do you know more about this?
Your sample kernel param string also configures ens192. Are you sure this still works with your 03-07 cluster?
I did another deployment attempt right now: I removed my code to generate a .nmconnection file for ens192 via ignition and only added the static IP config via kernel args (e.g. Afterburn). The result is that this (due to the absense of any .nmconnection file) apparently automatically generates a default_connection.nmconnection file with the data from the kernel args. But it does not change the problem. After the FCOS rebase step and the reboot, two out of my three master nodes do not have network configuration. I'm puzzled that one has it (all three are created exactly identically) and have no explanation why.
I can only report the problem here and hope that someone from the developer team can pick up or give me hints, ask for checks etc. ...
BTW, even on the master node where the IP config survived, I get (on the other two masters as well) this: [systemd] Failed Units: 1 ovs-configuration.service
And even that one master node that has IP connectivity the deployment does not progress but hangs, too.
The journalctl log for this service shows:
-- Reboot --
Mar 11 10:53:59 master-03.kur-test.ars.de systemd[1]: Starting Configures OVS with proper host networking configuration...
Mar 11 10:53:59 master-03.kur-test.ars.de configure-ovs.sh[920]: + rpm -qa
Mar 11 10:53:59 master-03.kur-test.ars.de configure-ovs.sh[921]: + grep -q openvswitch
Mar 11 10:53:59 master-03.kur-test.ars.de configure-ovs.sh[918]: + '[' OVNKubernetes == OVNKubernetes ']'
Mar 11 10:53:59 master-03.kur-test.ars.de configure-ovs.sh[918]: + '[' -d /etc/NetworkManager/system-connections-merged ']'
Mar 11 10:53:59 master-03.kur-test.ars.de configure-ovs.sh[918]: + NM_CONN_PATH=/etc/NetworkManager/system-connections-merged
Mar 11 10:53:59 master-03.kur-test.ars.de configure-ovs.sh[918]: + iface=
Mar 11 10:53:59 master-03.kur-test.ars.de configure-ovs.sh[918]: + counter=0
Mar 11 10:53:59 master-03.kur-test.ars.de configure-ovs.sh[918]: + '[' 0 -lt 12 ']'
Mar 11 10:53:59 master-03.kur-test.ars.de configure-ovs.sh[950]: ++ ip route show default
Mar 11 10:53:59 master-03.kur-test.ars.de configure-ovs.sh[951]: ++ awk '{ if ($4 == "dev") { print $5; exit } }'
Mar 11 10:53:59 master-03.kur-test.ars.de configure-ovs.sh[918]: + iface=ens192
Mar 11 10:53:59 master-03.kur-test.ars.de configure-ovs.sh[918]: + [[ -n ens192 ]]
Mar 11 10:53:59 master-03.kur-test.ars.de configure-ovs.sh[918]: + echo 'IPv4 Default gateway interface found: ens192'
Mar 11 10:53:59 master-03.kur-test.ars.de configure-ovs.sh[918]: IPv4 Default gateway interface found: ens192
Mar 11 10:53:59 master-03.kur-test.ars.de configure-ovs.sh[918]: + break
Mar 11 10:53:59 master-03.kur-test.ars.de configure-ovs.sh[918]: + '[' ens192 = br-ex ']'
Mar 11 10:53:59 master-03.kur-test.ars.de configure-ovs.sh[918]: + '[' -z ens192 ']'
Mar 11 10:53:59 master-03.kur-test.ars.de configure-ovs.sh[918]: + iface_mac=00:50:56:a5:ec:1d
Mar 11 10:53:59 master-03.kur-test.ars.de configure-ovs.sh[918]: + echo 'MAC address found for iface: ens192: 00:50:56:a5:ec:1d'
Mar 11 10:53:59 master-03.kur-test.ars.de configure-ovs.sh[918]: MAC address found for iface: ens192: 00:50:56:a5:ec:1d
Mar 11 10:53:59 master-03.kur-test.ars.de configure-ovs.sh[954]: ++ ip link show ens192
Mar 11 10:53:59 master-03.kur-test.ars.de configure-ovs.sh[955]: ++ awk '{print $5; exit}'
Mar 11 10:53:59 master-03.kur-test.ars.de configure-ovs.sh[918]: + iface_mtu=1500
Mar 11 10:53:59 master-03.kur-test.ars.de configure-ovs.sh[918]: + [[ -z 1500 ]]
Mar 11 10:53:59 master-03.kur-test.ars.de configure-ovs.sh[918]: + echo 'MTU found for iface: ens192: 1500'
Mar 11 10:53:59 master-03.kur-test.ars.de configure-ovs.sh[918]: MTU found for iface: ens192: 1500
Mar 11 10:53:59 master-03.kur-test.ars.de configure-ovs.sh[957]: ++ nmcli --fields UUID,DEVICE conn show --active
Mar 11 10:53:59 master-03.kur-test.ars.de configure-ovs.sh[958]: ++ awk '/\sens192\s*$/ {print $1}'
Mar 11 10:53:59 master-03.kur-test.ars.de configure-ovs.sh[918]: + old_conn=445bc266-9f75-422d-bb47-f094f65c5d8d
Mar 11 10:53:59 master-03.kur-test.ars.de configure-ovs.sh[918]: + extra_brex_args=
Mar 11 10:53:59 master-03.kur-test.ars.de configure-ovs.sh[962]: ++ nmcli --get-values ipv4.dhcp-client-id conn show 445bc266-9f75-422d-bb47-f094f65c5d8d
Mar 11 10:53:59 master-03.kur-test.ars.de configure-ovs.sh[918]: + dhcp_client_id=
Mar 11 10:53:59 master-03.kur-test.ars.de configure-ovs.sh[918]: + '[' -n '' ']'
Mar 11 10:53:59 master-03.kur-test.ars.de configure-ovs.sh[966]: ++ nmcli --get-values ipv6.dhcp-duid conn show 445bc266-9f75-422d-bb47-f094f65c5d8d
Mar 11 10:54:00 master-03.kur-test.ars.de configure-ovs.sh[918]: + dhcp6_client_id=
Mar 11 10:54:00 master-03.kur-test.ars.de configure-ovs.sh[918]: + '[' -n '' ']'
Mar 11 10:54:00 master-03.kur-test.ars.de configure-ovs.sh[918]: + nmcli connection show br-ex
Mar 11 10:54:00 master-03.kur-test.ars.de configure-ovs.sh[918]: + nmcli c add type ovs-bridge con-name br-ex conn.interface br-ex 802-3-ethernet.mtu 1500 802-3-eth>Mar 11 10:54:00 master-03.kur-test.ars.de configure-ovs.sh[974]: Connection 'br-ex' (1f37631e-796d-46d1-9314-9ddce069b91f) successfully added.
Mar 11 10:54:00 master-03.kur-test.ars.de configure-ovs.sh[918]: + nmcli connection show ovs-port-phys0
Mar 11 10:54:00 master-03.kur-test.ars.de configure-ovs.sh[918]: + nmcli c add type ovs-port conn.interface ens192 master br-ex con-name ovs-port-phys0
...skipping...
Mar 11 10:54:25 master-03.kur-test.ars.de configure-ovs.sh[918]: + counter=5
Mar 11 10:54:25 master-03.kur-test.ars.de configure-ovs.sh[918]: + '[' 5 -lt 5 ']'
Mar 11 10:54:25 master-03.kur-test.ars.de configure-ovs.sh[918]: + echo 'WARN: OVS did not succesfully activate NM connection. Attempting to bring up connections'
Mar 11 10:54:25 master-03.kur-test.ars.de configure-ovs.sh[918]: WARN: OVS did not succesfully activate NM connection. Attempting to bring up connections
Mar 11 10:54:25 master-03.kur-test.ars.de configure-ovs.sh[918]: + counter=0
Mar 11 10:54:25 master-03.kur-test.ars.de configure-ovs.sh[918]: + '[' 0 -lt 5 ']'
Mar 11 10:54:25 master-03.kur-test.ars.de configure-ovs.sh[918]: + nmcli conn up ovs-if-br-ex
Mar 11 10:54:25 master-03.kur-test.ars.de configure-ovs.sh[1185]: Error: unknown connection 'ovs-if-br-ex'.
Mar 11 10:54:25 master-03.kur-test.ars.de configure-ovs.sh[918]: + sleep 5
Mar 11 10:54:30 master-03.kur-test.ars.de configure-ovs.sh[918]: + counter=1
Mar 11 10:54:30 master-03.kur-test.ars.de configure-ovs.sh[918]: + '[' 1 -lt 5 ']'
Mar 11 10:54:30 master-03.kur-test.ars.de configure-ovs.sh[918]: + nmcli conn up ovs-if-br-ex
Mar 11 10:54:30 master-03.kur-test.ars.de configure-ovs.sh[1193]: Error: unknown connection 'ovs-if-br-ex'.
Mar 11 10:54:30 master-03.kur-test.ars.de configure-ovs.sh[918]: + sleep 5
Mar 11 10:54:35 master-03.kur-test.ars.de configure-ovs.sh[918]: + counter=2
Mar 11 10:54:35 master-03.kur-test.ars.de configure-ovs.sh[918]: + '[' 2 -lt 5 ']'
Mar 11 10:54:35 master-03.kur-test.ars.de configure-ovs.sh[918]: + nmcli conn up ovs-if-br-ex
Mar 11 10:54:35 master-03.kur-test.ars.de configure-ovs.sh[1200]: Error: unknown connection 'ovs-if-br-ex'.
Mar 11 10:54:35 master-03.kur-test.ars.de configure-ovs.sh[918]: + sleep 5
Mar 11 10:54:40 master-03.kur-test.ars.de configure-ovs.sh[918]: + counter=3
Mar 11 10:54:40 master-03.kur-test.ars.de configure-ovs.sh[918]: + '[' 3 -lt 5 ']'
Mar 11 10:54:40 master-03.kur-test.ars.de configure-ovs.sh[918]: + nmcli conn up ovs-if-br-ex
Mar 11 10:54:40 master-03.kur-test.ars.de configure-ovs.sh[1205]: Error: unknown connection 'ovs-if-br-ex'.
Mar 11 10:54:40 master-03.kur-test.ars.de configure-ovs.sh[918]: + sleep 5
Mar 11 10:54:45 master-03.kur-test.ars.de configure-ovs.sh[918]: + counter=4
Mar 11 10:54:45 master-03.kur-test.ars.de configure-ovs.sh[918]: + '[' 4 -lt 5 ']'
Mar 11 10:54:45 master-03.kur-test.ars.de configure-ovs.sh[918]: + nmcli conn up ovs-if-br-ex
Mar 11 10:54:45 master-03.kur-test.ars.de configure-ovs.sh[1210]: Error: unknown connection 'ovs-if-br-ex'.
Mar 11 10:54:45 master-03.kur-test.ars.de configure-ovs.sh[918]: + sleep 5
Mar 11 10:54:50 master-03.kur-test.ars.de configure-ovs.sh[918]: + counter=5
Mar 11 10:54:50 master-03.kur-test.ars.de configure-ovs.sh[918]: + '[' 5 -lt 5 ']'
Mar 11 10:54:50 master-03.kur-test.ars.de configure-ovs.sh[918]: + echo 'ERROR: Failed to activate ovs-if-br-ex NM connection'
Mar 11 10:54:50 master-03.kur-test.ars.de configure-ovs.sh[918]: ERROR: Failed to activate ovs-if-br-ex NM connection
Mar 11 10:54:50 master-03.kur-test.ars.de configure-ovs.sh[918]: + set +e
Mar 11 10:54:50 master-03.kur-test.ars.de configure-ovs.sh[918]: + nmcli conn down ovs-if-br-ex
Mar 11 10:54:50 master-03.kur-test.ars.de configure-ovs.sh[1217]: Error: 'ovs-if-br-ex' is not an active connection.
Mar 11 10:54:50 master-03.kur-test.ars.de configure-ovs.sh[1217]: Error: no active connection provided.
Mar 11 10:54:50 master-03.kur-test.ars.de configure-ovs.sh[918]: + nmcli conn down ovs-if-phys0
Mar 11 10:54:50 master-03.kur-test.ars.de configure-ovs.sh[1221]: Connection 'ovs-if-phys0' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkMan>Mar 11 10:54:50 master-03.kur-test.ars.de configure-ovs.sh[918]: + nmcli conn up 445bc266-9f75-422d-bb47-f094f65c5d8d
Mar 11 10:54:50 master-03.kur-test.ars.de configure-ovs.sh[1236]: Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnec>Mar 11 10:54:50 master-03.kur-test.ars.de configure-ovs.sh[918]: + exit 1
Mar 11 10:54:50 master-03.kur-test.ars.de systemd[1]: ovs-configuration.service: Main process exited, code=exited, status=1/FAILURE
Mar 11 10:54:50 master-03.kur-test.ars.de systemd[1]: ovs-configuration.service: Failed with result 'exit-code'.
Mar 11 10:54:50 master-03.kur-test.ars.de systemd[1]: Failed to start Configures OVS with proper host networking configuration.
So it says
unknown connection 'ovs-if-br-ex'
When I look into /etc/NetworkManager/system-connections, I see these files there:
-rw-------. 1 root root 345 Mar 11 10:54 br-ex.nmconnection
-rw-------. 1 root root 435 Mar 11 10:46 default_connection.nmconnection
-rw-------. 1 root root 282 Mar 11 10:54 ovs-if-phys0.nmconnection
-rw-------. 1 root root 168 Mar 11 10:54 ovs-port-br-ex.nmconnection
-rw-------. 1 root root 169 Mar 11 10:54 ovs-port-phys0.nmconnection
So a bug due to name mismatch? Missing file?
When I look on the master where I happen to have IP connectivity, I see:
[root@master-03 system-connections]# nmcli conn
NAME UUID TYPE DEVICE
Wired Connection 445bc266-9f75-422d-bb47-f094f65c5d8d ethernet ens192
br-ex 1f37631e-796d-46d1-9314-9ddce069b91f ovs-bridge br-ex
ovs-port-br-ex 21cf18e8-f9de-479d-b68c-27fb54f7e581 ovs-port br-ex
ovs-port-phys0 680aedf8-2265-42ac-95b7-2e909fdd48dd ovs-port ens192
ovs-if-phys0 a3248f34-01fb-4d9c-98fd-2066cc9a1919 ethernet --
On one of the other masters where I don't have that, I see:
So there is the difference. Here I do not have ens192 in the "Wired Connection" for whatever reason. The reason might be the different UUIDs and their alphabetical order. Where I have IP, the UUID of Wired Connection is "lower", otherwise that of ovs-if-phys0 is lower. So pure luck.
But that seems to be wrong either way?
This is how it looks like on a master node of a cluster with OKD 4.6 2021-02-14 which was installed with DHCP:
[core@master-01 ~]$ nmcli conn
NAME UUID TYPE DEVICE
ovs-if-br-ex 4ac8cd31-699b-445b-b28b-cb4ea0ca6295 ovs-interface br-ex
br-ex a73cb2f7-1459-473b-be0b-accf4bb2a0b5 ovs-bridge br-ex
ovs-if-phys0 640ae954-bb3a-4af9-ad87-c3503429a8ed ethernet ens192
ovs-port-br-ex b5be2d0d-cafd-47c3-a320-e4d25ccb9392 ovs-port br-ex
ovs-port-phys0 9ce62b13-f4a2-4696-855d-53414d2bb42e ovs-port ens192
Wired Connection 7c2b48f2-3587-4e5e-8f29-3aee250340d5 ethernet --
This one actually has the ovs-if-br-ex interface that the 4.7 cluster's master node complains about in the journalctl log.
I have just deployed a cluster with 4.7 2021-02-25 successfully (except for the https://github.com/openshift/okd/issues/536 issue). My static IP assignment worked fine. So it is definitely broken in the 4.7 2021-03-07 release, nothing wrong with my environment.
I have not tried the 03-06 release again. That the 03-07 release was released only one day later somehow implies to me that the 03-06 release is somehow poisoned and should be avoided?
I have later upgraded the 4.7 2021-02-25 cluster to 2021-03-07 and the static IP assignment (via kernel args) still works. So the 2021-03-07 release of OKD 4.7 can still work correctly with static IP assignment. But the initial installation process for OKD 4.7 2021-03-07 seems to have a bug in the sense that it fails like described above when the node VMs are started with static IP assignment via kernel args.
Here is a sample journalctl log from one of the nodes, from the rpm-ostree rebase before the reboot. It shows error messages that might be related to this interface issue or not. journalctl-no-ip.txt
This is how these error messages look like. I'm not convinced they are related to the IP assignment / missing interface problem but just for the record:
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5540]: (2021-03-16 19:33:52:015549): [sss_cache] [ldb] (0x0020): Unable to open tdb '/var/lib/sss/db/config.ldb': No such file or directory
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5540]: (2021-03-16 19:33:52:016155): [sss_cache] [ldb] (0x0020): Failed to connect to '/var/lib/sss/db/config.ldb' with backend 'tdb': Unable to open tdb '/var/lib/sss/db/config.ldb': No such file or directory
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5540]: (2021-03-16 19:33:52:016719): [sss_cache] [confdb_init] (0x0010): Unable to open config database [/var/lib/sss/db/config.ldb]
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5540]: (2021-03-16 19:33:52:016999): [sss_cache] [init_domains] (0x0020): Could not initialize connection to the confdb
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5540]: Could not open available domains
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5540]: (2021-03-16 19:33:52:017055): [sss_cache] [init_context] (0x0040): Initialization of sysdb connections failed
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5540]: (2021-03-16 19:33:52:017202): [sss_cache] [main] (0x0020): Error initializing context for the application
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5536]: groupadd: sss_cache exited with status 5
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5536]: groupadd: Failed to flush the sssd cache.
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5542]: (2021-03-16 19:33:52:064778): [sss_cache] [ldb] (0x0020): Unable to open tdb '/var/lib/sss/db/config.ldb': No such file or directory
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5542]: (2021-03-16 19:33:52:065274): [sss_cache] [ldb] (0x0020): Failed to connect to '/var/lib/sss/db/config.ldb' with backend 'tdb': Unable to open tdb '/var/lib/sss/db/config.ldb': No such file or directory
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5542]: (2021-03-16 19:33:52:066013): [sss_cache] [confdb_init] (0x0010): Unable to open config database [/var/lib/sss/db/config.ldb]
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5542]: (2021-03-16 19:33:52:066266): [sss_cache] [init_domains] (0x0020): Could not initialize connection to the confdb
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5542]: Could not open available domains
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5542]: (2021-03-16 19:33:52:066345): [sss_cache] [init_context] (0x0040): Initialization of sysdb connections failed
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5542]: (2021-03-16 19:33:52:066424): [sss_cache] [main] (0x0020): Error initializing context for the application
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5536]: groupadd: sss_cache exited with status 5
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5536]: groupadd: Failed to flush the sssd cache.
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5547]: (2021-03-16 19:33:52:152882): [sss_cache] [ldb] (0x0020): Unable to open tdb '/var/lib/sss/db/config.ldb': No such file or directory
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5547]: (2021-03-16 19:33:52:153295): [sss_cache] [ldb] (0x0020): Failed to connect to '/var/lib/sss/db/config.ldb' with backend 'tdb': Unable to open tdb '/var/lib/sss/db/config.ldb': No such file or directory
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5547]: (2021-03-16 19:33:52:153811): [sss_cache] [confdb_init] (0x0010): Unable to open config database [/var/lib/sss/db/config.ldb]
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5547]: (2021-03-16 19:33:52:154118): [sss_cache] [init_domains] (0x0020): Could not initialize connection to the confdb
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5547]: Could not open available domains
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5547]: (2021-03-16 19:33:52:154198): [sss_cache] [init_context] (0x0040): Initialization of sysdb connections failed
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5547]: (2021-03-16 19:33:52:154261): [sss_cache] [main] (0x0020): Error initializing context for the application
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5544]: useradd: sss_cache exited with status 5
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5544]: useradd: Failed to flush the sssd cache.
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5551]: (2021-03-16 19:33:52:194511): [sss_cache] [ldb] (0x0020): Unable to open tdb '/var/lib/sss/db/config.ldb': No such file or directory
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5551]: (2021-03-16 19:33:52:194848): [sss_cache] [ldb] (0x0020): Failed to connect to '/var/lib/sss/db/config.ldb' with backend 'tdb': Unable to open tdb '/var/lib/sss/db/config.ldb': No such file or directory
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5551]: (2021-03-16 19:33:52:194957): [sss_cache] [confdb_init] (0x0010): Unable to open config database [/var/lib/sss/db/config.ldb]
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5551]: (2021-03-16 19:33:52:195815): [sss_cache] [init_domains] (0x0020): Could not initialize connection to the confdb
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5551]: Could not open available domains
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5551]: (2021-03-16 19:33:52:195882): [sss_cache] [init_context] (0x0040): Initialization of sysdb connections failed
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5551]: (2021-03-16 19:33:52:195934): [sss_cache] [main] (0x0020): Error initializing context for the application
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5544]: useradd: sss_cache exited with status 5
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5544]: useradd: Failed to flush the sssd cache.
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5557]: (2021-03-16 19:33:52:284207): [sss_cache] [ldb] (0x0020): Unable to open tdb '/var/lib/sss/db/config.ldb': No such file or directory
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5557]: (2021-03-16 19:33:52:284630): [sss_cache] [ldb] (0x0020): Failed to connect to '/var/lib/sss/db/config.ldb' with backend 'tdb': Unable to open tdb '/var/lib/sss/db/config.ldb': No such file or directory
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5557]: (2021-03-16 19:33:52:285396): [sss_cache] [confdb_init] (0x0010): Unable to open config database [/var/lib/sss/db/config.ldb]
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5557]: (2021-03-16 19:33:52:285926): [sss_cache] [init_domains] (0x0020): Could not initialize connection to the confdb
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5557]: Could not open available domains
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5557]: (2021-03-16 19:33:52:285999): [sss_cache] [init_context] (0x0040): Initialization of sysdb connections failed
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5557]: (2021-03-16 19:33:52:286069): [sss_cache] [main] (0x0020): Error initializing context for the application
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5554]: groupadd: sss_cache exited with status 5
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5554]: groupadd: Failed to flush the sssd cache.
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5559]: (2021-03-16 19:33:52:327728): [sss_cache] [ldb] (0x0020): Unable to open tdb '/var/lib/sss/db/config.ldb': No such file or directory
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5559]: (2021-03-16 19:33:52:328147): [sss_cache] [ldb] (0x0020): Failed to connect to '/var/lib/sss/db/config.ldb' with backend 'tdb': Unable to open tdb '/var/lib/sss/db/config.ldb': No such file or directory
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5559]: (2021-03-16 19:33:52:328675): [sss_cache] [confdb_init] (0x0010): Unable to open config database [/var/lib/sss/db/config.ldb]
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5559]: (2021-03-16 19:33:52:328928): [sss_cache] [init_domains] (0x0020): Could not initialize connection to the confdb
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5559]: Could not open available domains
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5559]: (2021-03-16 19:33:52:329049): [sss_cache] [init_context] (0x0040): Initialization of sysdb connections failed
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5559]: (2021-03-16 19:33:52:329123): [sss_cache] [main] (0x0020): Error initializing context for the application
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5554]: groupadd: sss_cache exited with status 5
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5554]: groupadd: Failed to flush the sssd cache.
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5563]: (2021-03-16 19:33:52:400409): [sss_cache] [ldb] (0x0020): Unable to open tdb '/var/lib/sss/db/config.ldb': No such file or directory
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5563]: (2021-03-16 19:33:52:400836): [sss_cache] [ldb] (0x0020): Failed to connect to '/var/lib/sss/db/config.ldb' with backend 'tdb': Unable to open tdb '/var/lib/sss/db/config.ldb': No such file or directory
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5563]: (2021-03-16 19:33:52:401392): [sss_cache] [confdb_init] (0x0010): Unable to open config database [/var/lib/sss/db/config.ldb]
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5563]: (2021-03-16 19:33:52:401685): [sss_cache] [init_domains] (0x0020): Could not initialize connection to the confdb
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5563]: Could not open available domains
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5563]: (2021-03-16 19:33:52:401738): [sss_cache] [init_context] (0x0040): Initialization of sysdb connections failed
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5563]: (2021-03-16 19:33:52:401955): [sss_cache] [main] (0x0020): Error initializing context for the application
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5560]: usermod: sss_cache exited with status 5
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5560]: usermod: Failed to flush the sssd cache.
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5566]: (2021-03-16 19:33:52:443040): [sss_cache] [ldb] (0x0020): Unable to open tdb '/var/lib/sss/db/config.ldb': No such file or directory
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5566]: (2021-03-16 19:33:52:443404): [sss_cache] [ldb] (0x0020): Failed to connect to '/var/lib/sss/db/config.ldb' with backend 'tdb': Unable to open tdb '/var/lib/sss/db/config.ldb': No such file or directory
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5566]: (2021-03-16 19:33:52:443955): [sss_cache] [confdb_init] (0x0010): Unable to open config database [/var/lib/sss/db/config.ldb]
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5566]: (2021-03-16 19:33:52:444184): [sss_cache] [init_domains] (0x0020): Could not initialize connection to the confdb
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5566]: Could not open available domains
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5566]: (2021-03-16 19:33:52:444270): [sss_cache] [init_context] (0x0040): Initialization of sysdb connections failed
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5566]: (2021-03-16 19:33:52:444418): [sss_cache] [main] (0x0020): Error initializing context for the application
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5560]: usermod: sss_cache exited with status 5
Mar 16 19:33:52 master-01.kur-test2.ars.de rpm-ostree(openvswitch.prein)[5560]: usermod: Failed to flush the sssd cache.
Hi kai-uwe-rommel, thank you very much for describing your problems and experiences, at least I am currently very busy so didn't find the time to answer appropriately, but I was reading everything so far :) Using vmware + afterburn sounds like an interesting approach, I will take a closer look to it - Thanks!
I can also confirm that with the current stable 4.7.0-2021-03-07 release the static configuration via kernel params works without problems also in an initial installation using that release and latest CoreOS - I tested that yesterday...
Best regards
@devzeronull , how exactly do you set the kernel arguments? Previously (before switching to Afterburn), I modified the VM images and wrote the IP config into the ignition.firstboot file (e.g. with set ignition_network_kcmdline=...). Is that also what you do (your previous message sounds like that)? Or yet another approach? Do you then also add a .nmconnection file into /etc/NetworkManager/system-connections?
If you can tell me how exactly you do it, I would like to test that out if it makes a difference for my setup process. Although it would really be a pity to drop Afterburn, which is really a very elegant solution (and has worked fine so far).
I tried falling back on configuring static IPs with modifying the ignition.firstboot file instead of using Afterburn, but the result is the same - failure. Something really weird is going on here. I would really like to know how our setups differ.
If you can tell me how exactly you do it, I would like to test that out if it makes a difference for my setup process. Although it would really be a pity to drop Afterburn, which is really a very elegant solution (and has worked fine so far).
Hi,
I am attaching the following directly to the CoreOS Bootloader to parametrize Dracut/CoreOS Installers Network Configuration:
rd.neednet=1 ip=NODE_IP::GATEWAY_IP:NETMASK:NODE_FQDN_HOSTNAME:ens192:none nameserver=DNS_SERVER_1 nameserver=DNS_SERVER_2
which you can do by hand or in the isolinux.cfg of the boot-image and generate boot ISOs using tools like geniso on Linux.
Steps to generate an customized ISO for e.g. an bootstrap node are:
bsdtar -xvf fedora-coreos-33.20210201.3.0-live.x86_64.iso -C extracted
sed -i 's/\(.*append.*\)/\1 coreos.inst.install_dev=\/dev\/sda coreos.inst.image_url=http:\/\/IMAGE_HOST_IP:8080\/fedora-coreos.raw.xz coreos.inst.ignition_url=http:\/\/IGNITION_HOST_IP:8080\/bootstrap.ign rd.neednet=1 ip=NODE_IP::GATEWAY_IP:NETMASK:NODE_FQDN_HOSTNAME:ens192:none nameserver=DNS_SERVER_1 nameserver=DNS_SERVER_2/g' extracted/isolinux/isolinux.cfg
mkisofs -o fedora-coreos-33.20210201.3.0-live.x86_64_bootstrap.iso -b isolinux.bin -c boot.cat -no-emul-boot -V 'fedora-coreos-33.20210201.3.0' -boot-load-size 4 -boot-info-table -R -J -v -T extracted/isolinux extracted
Using this method you can than easily automate the provisioning process with working static IP setup if you generate the ISO on-the-fly and boot it using PXE/TFTP.
In addition to that you could of course combine it with individual Ignition configs and machine sets...
I understand your approach. I think it is more suitable for bare metal installations. It would of course also work for vSphere installations but it is more complicated and sacrifices the benefits of vSphere. I can check if I find time to try this out. However, meanwhile I firmly believe that I do not have a problem in my environment or process. I have this problem/failure ONLY with the 4.7 2021-03-07 release. EVERY OTHER release, including even the 4.7 2021-03-06 release (!) work fine with EVERYTHING ELSE UNCHANGED. E.g. in the same environment and with the same installation process. Only 4.7 2021-03-07 fails. So some new problem was introduced with it. I can now of course simply give up and wait and see if the next release is working correctly under this aspect again. But that may happen or not. I think it is better that I report the problem here so that someone from the team can pick up and fix the problem if it is not just a glitch that will not happen with the next release again. The problem is, will someone from the team a) believe me and b) invest the time to fix it?
I can report that the rpm-ostree error messages I reported above yesterday do not seem to relate with the static IP problem. These messages also appear when installing the 03-06 release instead of the 03-07 release. But with the 03-06 release the static IP assignment works fine.
Yes, I conclude that this is not the optimal provisioning process but in the end the result is the same. Since we are running in an security aware environment we can sadly not allow an "application" to control our infrastructure/Hypervisor - so there is no alternative to bare-metal for us :(
Anyways, I will stay informed on your raised issue and interested in the solution!
@devzeronull, which version of FCOS are you using for your initial VM creation?
It looks like I can reproduce the same problem in the same configuration with OCP 4.7.2 (this has worked before in OCP 4.7.0).
as discussed in slack, this is hopefully fixed for OKD in releases: https://origin-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.7.0-0.okd/release/4.7.0-0.okd-2021-03-22-172926 and later
Yes, I can confirm that I was able to install a new OKD cluster with this release and static IP configuration. Now we just need the next stable build with these fixes (and no other new bugs ...). :-)
Describe the bug I fell about this new bug during working on another issue: https://github.com/openshift/okd/issues/536
I had done my previous installations of 4.7.0 all with static IP configuration (because that is what all my teams need). That means, I used AfterBurn for the initial boot IP configuration through vSphere VM advanced config settings and through ignition created a NetworkManager config file as well as /etc/hostname and /etc/hosts files. This worked correctly with the OKD 4.7 releases of 02-25 and 03-06.
I did installation( attempts) with the 03-07 release of OKD 4.7 today. With the first one, bootstrap and all three master nodes initially came up with their ignition config fine. After the master nodes had been rebased their FCOS and rebooted, only one of them had a working network config. The other two master nodes had not configured their ens192 and were not accessible via network. I could login through VM console, though. I first thought about some external problem and redid the complete installation. The second time, the same happened but all three masters had no networking after the rebase+reboot. But the config file /etc/NetworkManager/system-connections was there. No idea why it was not applied. I had no time to waste on this (as I was actually working on anoher issue) and so I did a third installation attempt, this time with DHCP for IP configuration. This worked insofar as at least the nodes always had a working network.
Version 4.7.0-0.okd-2021-03-07-090821
How reproducible Apparently, always, see above.
Log bundle Difficult because at this time no log gathering possible... But just let me know what I should gather and I will try again. Of course, it will be difficult to get the logs out of the machines with no working networking ...