ostreedev / ostree

Operating system and container binary deployment and upgrades
https://ostreedev.github.io/ostree/
Other
1.29k stars 296 forks source link

PXE live boot of ostree system no longer works #1989

Open ibikestl opened 4 years ago

ibikestl commented 4 years ago

It appears that commit a4a49724d6f898fd5e76bd6de49d36f7ed8d237e introduced a regression where CentOS 7.7 hosts will no longer boot a live PXE image.

Symptoms: Loop at boot with the following message (The XXXXXXXXX redacts potentially sensitive information):

[    3.351010] systemd[1]: Starting OSTree Prepare OS/...
         Starting OSTree Prepare OS/...
[    3.352397] systemd[1]: Reached target Slices.
[  OK  ] Reached target Slices.
[    3.354132] systemd[1]: Starting Journal Service...
         Starting Journal Service...
[    3.356856] systemd[1]: Starting Apply Kernel Variables...
         Starting Apply Kernel Variables...
[    3.361903] systemd[1]: Starting Create list of required static device nodes for the current kernel...
         Starting Create list of required st... nodes for the current kernel...
[    3.371061] systemd[1]: Starting dracut cmdline hook...
         Starting dracut cmdline hook...
[    3.376633] systemd[1]: ostree-prepare-root.service: main process exited, code=exited, status=1/FAILURE
[    3.380507] systemd[1]: Failed to start OSTree Prepare OS/.
[FAILED] Failed to start OSTree Prepare OS/.
See 'systemctl status ostree-prepare-root.service' for details.
[    3.386792] systemd[1]: Unit ostree-prepare-root.service entered failed state.
[    3.388370] systemd[1]: Triggering OnFailure= dependencies of ostree-prepare-root.service.
[    2.905918] ostree-prepare-root[77]: ostree-prepare-root: Couldn't find specified OSTree root '/sysroot//ostree/boot.0/XXXXXXXXX/95d48fbe6fe2375467ae4869c850957854e5f8a0f8ea2aa06086f692c4ff41b1/0': No such file or directory[    3.393462] systemd[1]: ostree-prepare-root.service failed.

Root cause: ostree-prepare-root is run before the live squashfs image is downloaded

Steps to reproduce:

  1. Create an rpm-ostree tree that installs ostree-2019.1-2
  2. Create a live PXE boot using rpm-ostree-toolbox liveimage
  3. PXE boot (VM or metal) using the produced PXE live image

Reverting to ostree-2018.5-1 which does not contain this update results in a successful boot.

jlebon commented 4 years ago

Thanks for the report! Unfortunately, we no longer develop for either rpm-ostree-toolbox or CentOS 7 (see https://github.com/coreos/rpm-ostree/releases/tag/v2019.3).

For now, your best bet is probably to stay on 2018.5. If you'd like to try something new, there is Fedora CoreOS, which also has live ISO/PXE capabilities: https://getfedora.org/en/coreos/. It's built the same way RHEL CoreOS is. For more information on that, see https://github.com/coreos/coreos-assembler.

cgwalters commented 4 years ago

Hmm. Is it a race or are we consistently failing?

I take regressions seriously, but...this one is likely to be very messy to fix without impacting other use cases. Agree with jlebon's comment; if you're doing custom rpm-ostree runs you should be able to pin to the earlier ostree package. Another alternative is to carry a revert of that patch in RHEL/CentOS 7.

To really look at this I'd need to dive in and analyze the startup order for the -toolbox produced live PXE - if you're doing this type of thing though as jlebon mentioned you very likely want to be switching to FCOS now as it's a top-level use case for us (and our toolchain is much better).

ibikestl commented 4 years ago

From my testing, it was a consistent failure until I rolled back to 2018.5. I am doing a custom build, so no issues pinning the version at 2018.5, which is what I've done to workaround the issue.

My use case may somewhat unique. I have a need to stay on CentOS 7 (there will eventually be a migration to CentOS 8) for systems that run on metal: no virtualization or containers. The benefits of an ostree based install are that the OS can now essentially be treated much more like an A/B firmware flash; package updates can be rolled out and if the updated packages are unacceptable rollback is a quick reboot.

My understanding of CoreOS is that is intended to be a very lightweight OS install that is focused on getting just enough OS down on disk to be able to run containers. What I need is a full OS where users run applications on the OS without relying on a container or VM. I understand that the use case I described is certainly a small niche compared to the vast majority of use cases for computing platforms.

jlebon commented 4 years ago

If you're already composing your own systems today, the analogous path is to have a source config repo like https://github.com/coreos/fedora-coreos-config and use https://github.com/coreos/coreos-assembler to assemble it. Then creating live media is as easy as buildextend-live. Just like before, you can specify in the treefile the list of packages you'd like included in the compose.

That said, until you're ready to move to CentOS 8 or Fedora binaries, it's going to be very hard getting the coreos-assembler model working on top of CentOS 7.

ibikestl commented 4 years ago

Using rpm-ostree and friends for ostree installs on metal and live media is good by me for CentOS 7 family distros. It's a pretty well worn path for me at this point. I saw that rpm-ostree had dropped support for CentOS 7 in newer releases, but I didn't know that ostree had dropped support as well; that's why I created the issue to report a regression.

From a quick glance through the coreos-assembler project it looks like rpm-ostree --treeocompose is still used to generate the ostree commits and then coreos-assesmbler takes the commits as input and outputs various VM images, container images, installer media, live media, etc. If so, it would seem that with CentOS 8 I would still use rpm-ostree --treecompose to create a new ostree commit and then use that commit to drive installs via kickstart on metal and image and media creation via coreos-assembler. Is that about right?

Thanks for the shepherding and guidance for my admittedly small use case. For what it's worth, using an ostree based install on metal makes it far easier to roll forward because it's so easy to roll back. All the hassle of dependency resolution of downgrading to a previous package or release is simply gone and you're assured of getting the same install every time.

jlebon commented 4 years ago

From a quick glance through the coreos-assembler project it looks like rpm-ostree --treeocompose is still used to generate the ostree commits and then coreos-assesmbler takes the commits as input and outputs various VM images, container images, installer media, live media, etc.

Yup, exactly!

If so, it would seem that with CentOS 8 I would still use rpm-ostree --treecompose to create a new ostree commit and then use that commit to drive installs via kickstart on metal and image and media creation via coreos-assembler. Is that about right?

The Fedora/RHEL CoreOS model is to do away with kickstarts in favour of Ignition. Think kickstart-like functionality but on first boot of the machine. These links might be helpful (we've actually just revamped the Fedora CoreOS documentation!):

I'm not sure what the status of Ignition in CentOS 8 is. If it's not already there, it should make its way eventually. (Though higher-level, AFAIK no one has really tried a CentOS-based version of RHEL CoreOS yet, so expect some turbulence ;) )

cgwalters commented 4 years ago

Thanks for the shepherding and guidance for my admittedly small use case.

To be clear, OSTree since the start is intentionally is orthogonal to containerization and we definitely want to support use cases like yours.

Bigger picture though, OSTree-style systems are obviously strongly complementary to containerization, and it's where a lot of the industry is going.