okd-project / okd

The self-managing, auto-upgrading, Kubernetes distribution for everyone
https://okd.io
Apache License 2.0
1.72k stars 295 forks source link

Baremetal on OCI: #1969

Closed fcolomas closed 1 month ago

fcolomas commented 2 months ago

When we create the images to install as Baremetal on OCI, (we embed the ignition file on the ISO and then use that as custom images) Bootstrap works fine, but master nodes keeps o a loop with this error:

Jul 11 06:26:55 ddmstr01.sndamdocsv360.vcndesarrollo.oraclevcn.com epic_khorana[1710]: W0711 06:26:55.745940 1712 firstboot_complete_machineconfig.go:65] error: failed to remove pending deployment: error running rpm-ostree cleanup -p: error: cleanup: Invoking cleanup: GDBus.Error:org.gtk.GDBus.UnmappedGError.Quark._g_2dio_2derror_2dquark.Code14: Remounting /sysroot read-write: Permission denied Jul 11 06:26:55 ddmstr01.sndamdocsv360.vcndesarrollo.oraclevcn.com epic_khorana[1710]: : exit status 1 Jul 11 06:26:55 ddmstr01.sndamdocsv360.vcndesarrollo.oraclevcn.com epic_khorana[1710]: I0711 06:26:55.745945 1712 firstboot_complete_machineconfig.go:66] Sleeping 1 minute for retry

Version

4.15.0-0.okd-2024-03-10-010116 How reproducible

100%, happens everytime on OCI platform, Agent based installer has a very similar issue

Log bundle

Jul 11 06:26:55 ddmstr01.sndamdocsv360.vcndesarrollo.oraclevcn.com podman[1697]: I0711 06:26:55.745934 1712 update.go:1618] Deleting stale data Jul 11 06:26:55 ddmstr01.sndamdocsv360.vcndesarrollo.oraclevcn.com podman[1697]: I0711 06:26:55.745936 1712 update.go:2371] Removing SIGTERM protection Jul 11 06:26:55 ddmstr01.sndamdocsv360.vcndesarrollo.oraclevcn.com podman[1697]: W0711 06:26:55.745940 1712 firstboot_complete_machineconfig.go:65] error: failed to remove pending deployment: error running rpm-ostree cleanup -p: error: cleanup: Invoking cleanup: GDBus.Error:org.gtk.GDBus.UnmappedGError.Quark._g_2dio_2derror_2dquark.Code14: Remounting /sysroot read-write: Permission denied Jul 11 06:26:55 ddmstr01.sndamdocsv360.vcndesarrollo.oraclevcn.com podman[1697]: : exit status 1 Jul 11 06:26:55 ddmstr01.sndamdocsv360.vcndesarrollo.oraclevcn.com podman[1697]: I0711 06:26:55.745945 1712 firstboot_complete_machineconfig.go:66] Sleeping 1 minute for retry Jul 11 06:26:55 ddmstr01.sndamdocsv360.vcndesarrollo.oraclevcn.com epic_khorana[1710]: I0711 06:26:55.745934 1712 update.go:1618] Deleting stale data Jul 11 06:26:55 ddmstr01.sndamdocsv360.vcndesarrollo.oraclevcn.com epic_khorana[1710]: I0711 06:26:55.745936 1712 update.go:2371] Removing SIGTERM protection Jul 11 06:26:55 ddmstr01.sndamdocsv360.vcndesarrollo.oraclevcn.com epic_khorana[1710]: W0711 06:26:55.745940 1712 firstboot_complete_machineconfig.go:65] error: failed to remove pending deployment: error running rpm-ostree cleanup -p: error: cleanup: Invoking cleanup: GDBus.Error:org.gtk.GDBus.UnmappedGError.Quark._g_2dio_2derror_2dquark.Code14: Remounting /sysroot read-write: Permission denied Jul 11 06:26:55 ddmstr01.sndamdocsv360.vcndesarrollo.oraclevcn.com epic_khorana[1710]: : exit status 1 Jul 11 06:26:55 ddmstr01.sndamdocsv360.vcndesarrollo.oraclevcn.com epic_khorana[1710]: I0711 06:26:55.745945 1712 firstboot_complete_machineconfig.go:66] Sleeping 1 minute for retry

DoodlesOnMyFood commented 2 months ago

I had a similar issue, although on openstack.

The main issue was that /sysroot was mounted on loopback device that was limited to read only.

I realized that FCOS live iso had this setup written on the image, and I used the wrong FCOS image.

Using the openstack release of FCOS, I can tell the mounts were setup differently.

Just putting my 2 cents, Maybe you're not using the metal release of FCOS?

LamNguy commented 2 months ago

Hi, I used the same way as you and got the similar error, I'm not sure ... may be related to the destination you install. In my case, I installed Openshift cluster on the OCP-Virtualization VM. I fixed by attaching the VM with the base iso (not injected), when you first boot the OS, you can run the command to install with ignition file, assume I want install on /dev/vda: sudo coreos-installer install /dev/vda --ignition-url http://192.168.30.17/openshift4//ignitions/bootstrap.ign --insecure-ignition

Do with each node, reboot , after that the cluster was installed successfully

fcolomas commented 1 month ago

@LamNguy thanks I'll try to go baremetal @DoodlesOnMyFood you make me notice also I was using the live image, I'm going to give a try with the openshif-installer images command to see the right one

JaimeMagiera commented 1 month ago

Hi,

We are not working on FCOS builds of OKD any more. Please see these documents...

https://okd.io/blog/2024/06/01/okd-future-statement https://okd.io/blog/2024/07/30/okd-pre-release-testing

We will be providing documentation on upgrading clusters from 4.15 FCOS to 4.16 SCOS. In the meantime, you may be able to get help from community members. I'll convert this to a discussion to facilitate that.

Many thanks,

Jaime