okd-project / okd

The self-managing, auto-upgrading, Kubernetes distribution for everyone
https://okd.io
Apache License 2.0
1.73k stars 295 forks source link

Bootstrap could not run on fresh UPI installation on stable FCOS #440

Closed kcchu closed 3 years ago

kcchu commented 3 years ago

Describe the bug Bootstrap could not run on fresh UPI installation on stable FCOS 33.20201201.3.0 for this error:

Dec 20 09:03:42 bootstrap release-image-download.sh[931]: Pull failed. Retrying quay.io/openshift/okd@sha256:01948f4c6bdd85cdd212eb40d96527a53d6382c4489d7da57522864178620a2c...
Dec 20 09:03:42 bootstrap release-image-download.sh[435268]: Error: Error initializing source docker://quay.io/openshift/okd@sha256:01948f4c6bdd85cdd212eb40d96527a53d6382c4489d7da57522864178620a2c: error pinging docker registry quay.io: Get "https://quay.io/v2/": dial tcp: lookup quay.io on [::1]:53: read udp [::1]:60125->[::1]:53: read: connection refused

Apparently the OS files were not provisioned correctly. The /etc/resolv.conf is symlinked to a nonexistent file

[core@bootstrap resolve]$ cat /etc/resolv.conf 
cat: /etc/resolv.conf: No such file or directory
[core@bootstrap resolve]$ ls -l /etc/resolv.conf 
lrwxrwxrwx. 1 root root 39 Dec 20 08:55 /etc/resolv.conf -> ../run/systemd/resolve/stub-resolv.conf
[core@bootstrap resolve]$ ls /run/systemd/resolve/
netif  resolv.conf
[core@bootstrap ~]$ rpm-ostree  status
State: idle
Deployments:
● pivot://quay.io/openshift/okd-content@sha256:95034a94e28949af41a53b9efb2fbb0651454a7c37bab002b0646e73c4721829
              CustomOrigin: Managed by machine-config-operator
                 Timestamp: 2020-12-12T05:05:37Z

  ostree://fedora:fedora/x86_64/coreos/stable
                   Version: 33.20201201.3.0 (2020-12-16T23:40:37Z)
                    Commit: cad80088392fe43bd3cadf0481c3267f199afa7d9f83bc03937ffdbf5ebbc6da
              GPGSignature: Valid signature by 963A2BEB02009608FE67EA4249FD77499570FF31

Version openshift-install-linux-4.6.0-0.okd-2020-12-12-135354

How reproducible

  1. Create ignition files using openshift-install command
  2. Start FCOS 33.20201201.3.0, and install coreos using coreos-installer and the bootstrap.ign generated in step 1
  3. Wait for the firstboot to finish and the machine to reboot
  4. SSH to the bootstrap host and run journalctl -b -f -u release-image.service -u bootkube.service

Log bundle Bootstrap node not starting on fresh installation

klzsysy commented 3 years ago

same here

Can only upgrade from 4.5 to 4.6

vrutkovs commented 3 years ago

Workaround: use previous stable (F32-based) for now

marco-wrk commented 3 years ago

issue replicated using the F32-based AMI.

Version openshift-install-linux-4.6.0-0.okd-2020-12-12-135354

AMI fedora-coreos-32.20201104.3.0

bnevis-i commented 3 years ago

Workaround: use previous stable (F32-based) for now

FYI for anyone else running across this. ISO download link is https://builds.coreos.fedoraproject.org/prod/streams/stable/builds/32.20201104.3.0/x86_64/fedora-coreos-32.20201104.3.0-live.x86_64.iso

kai-uwe-rommel commented 3 years ago

For me, this statement in storage.links in the initial ignition file (necessary for all node types) worked around the problem:

      {
        "group": {},
        "path": "/etc/resolv.conf",
        "user": {},
        "target": "../run/systemd/resolve/resolv.conf"
      }

Then a cluster deployment based on FCOS33-20201201 just completed successfully.

marco-wrk commented 3 years ago

The workaround suggested by @kai-uwe-rommel is working fine also in my use case, allowing to deploy OKD. Thanks for the hint.

Version openshift-install-linux-4.6.0-0.okd-2020-12-12-135354

AMI Fedora CoreOS 33.20201209

arcdigital commented 3 years ago

This solved it for me too

garethhk commented 3 years ago

For me, this statement in storage.links in the initial ignition file (necessary for all node types) worked around the problem:

      {
        "group": {},
        "path": "/etc/resolv.conf",
        "user": {},
        "target": "../run/systemd/resolve/resolv.conf"
      }

Then a cluster deployment based on FCOS33-20201201 just completed successfully.

I have followed https://medium.com/@craig_robinson/openshift-4-4-okd-bare-metal-install-on-vmware-home-lab-6841ce2d37eb to install 4.6 without success( but 4.4 is okay), and i figured out, the problem is the selinux problem to create stub-resolv.conf, I see that your work may help me to have a work around, Can you please suggest that where can i put the storage.link information? thank you!

kai-uwe-rommel commented 3 years ago

Well, as I wrote, in your ignition file in the "storage" section under "links"... So either you use (like I do) short stub ignition files where you merge the ones from openshift-install and then you put this into these small files. Or you need to modify the real ignition files (I would avoid this).

garethhk commented 3 years ago

Well, as I wrote, in your ignition file in the "storage" section under "links"... So either you use (like I do) short stub ignition files where you merge the ones from openshift-install and then you put this into these small files. Or you need to modify the real ignition files (I would avoid this).

From the openshift-install generated bootstrap.ign, master.ign, etc file, i cannot find the "links" wording Do you mean i can put in storage section from a append-bootstrap.ign file as below and point to generated bootstrap.ign file:

============append-bootstrap.ign============= { "ignition": { "config": { "merge": [ { "source": "http:///ignition/bootstrap.ign" } ] }, "timeouts": {}, "version": "3.1.0" }, "networkd": {}, "passwd": {}, "storage": {}, "systemd": {} }

kai-uwe-rommel commented 3 years ago

I personally would suggtest that you move to "indirect" ignition files merged remotely from a http server into very small initial ignition files for all three node types, not just for the bootstrap node. That's what I do.

And then an initial ignition file can look like this (I added more stuff, this is just a sample):

{ "ignition": { "config": { "merge": [ { "source": "http://1.2.3.4/master.ign", "verification": {} } ] }, "security": { "tls": {} }, "timeouts": {}, "version": "3.0.0" }, "passwd": { "users": [ { "name": "sysadmin", "passwordHash": "....", "groups": [ "sudo", "docker" ] } ] }, "storage": { "files": [ { "group": {}, "overwrite": true, "path": "/etc/hostname", "user": {}, "contents": { "source": "data:,master-03.kur-test.ars.de", "verification": {} }, "mode": 420 }, { "group": {}, "overwrite": true, "path": "/etc/hosts", "user": {}, "contents": { "source": "data:text/plain;base64,.....", "verification": {} }, "mode": 420 }, { "group": {}, "overwrite": true, "path": "/etc/NetworkManager/system-connections/ens192.nmconnection", "user": {}, "contents": { "source": "data:text/plain;base64,...", "verification": {} }, "mode": 384 }, { "group": {}, "overwrite": true, "path": "/etc/chrony.conf", "user": {}, "contents": { "source": "data:text/plain;base64,...", "verification": {} }, "mode": 384 }, { "group": {}, "overwrite": true, "path": "/etc/pki/ca-trust/source/anchors/ca-chain.crt", "user": {}, "contents": { "source": "data:text/plain;base64,....", "verification": {} }, "mode": 384 } ], "links": [ { "group": {}, "path": "/etc/localtime", "user": {}, "target": "../usr/share/zoneinfo/UTC" }, { "group": {}, "path": "/etc/resolv.conf", "user": {}, "target": "../run/systemd/resolve/resolv.conf" } ] }, "systemd": {} }

m-yosefpor commented 3 years ago

Workaround: use previous stable (F32-based) for now

FYI for anyone else running across this. ISO download link is https://builds.coreos.fedoraproject.org/prod/streams/stable/builds/32.20201104.3.0/x86_64/fedora-coreos-32.20201104.3.0-live.x86_64.iso

Thanks.

Also for other distros, you can get the latest v33 download link from fcos download page, and substitute the version with 32.20201104.3.0 in both builds/VERSION/x86_64/ and fedora-coreos-VERSION-.... in the URL, and it should work. (I've tested openstack qcow2).

m-yosefpor commented 3 years ago

Duplicate of https://github.com/openshift/okd/issues/477

This is resolved in the new release.

openshift-install 4.6.0-0.okd-2021-02-14-205305
FCOS 33.20210117.3.2 stable