openshift / okd-machine-os

OKD machine OS build scripts and manifests
Apache License 2.0
23 stars 27 forks source link

Tracker to stop rebuilding FCOS #210

Closed jlebon closed 2 years ago

jlebon commented 3 years ago

The README includes details on the diff between FCOS and OKD. Let's figure out how to address each of them without having to rebuild FCOS.

jlebon commented 3 years ago

manifest.yaml is a copy of FCOS manifest with the following changes:

  • tweaked version (special OKD version is set to designate the difference between OKD image and FCOS) and custom ostree ref

We naturally won't need this anymore once we stop rebuilding FCOS.

  • On top of FCOS base configuration additional OKD packages are installed:
    • openshift-hyperkube - kubelet

OK yup, that's the big one. I guess for that we'd need something like https://github.com/openshift/os/issues/498.

  • crio, cri-tools - container runtime

https://github.com/coreos/fedora-coreos-tracker/issues/767

  • NetworkManager-ovs for OpenshiftOVN
  • open-vm-tools, qemu-guest-agent - cloud tools for vSphere/oVirt

These are Fedora packages, right? Should be able to package layer.

  • openshift-clients - RPM with oc binary

Same as openshift-kyperkube.

  • glusterfs, glusterfs-fuse - required to pass glusterfs tests

These are also Fedora packages, right?

  • packages is updated to avoid including zincati (OKD uses CVO/MCO for updates)

We can mask the Zincati service.

  • Available repos are disabled in postprocess section to make sure updates are reproducible

If we start package layering, then we need the repos enabled. But I think we could have a new mode in rpm-ostree where we only allow some package layering. Alternatively... we ship all the Fedora packages we need in a separate container? (Something like OS extensions, but it'd need to be more closely integrated.)

OKD machine-os inherits image.yaml to produce ostree commit and manifest-lock.* files to ensure base packages are as close to FCOS as possible.

Overlayed configuration is used in overlay.d, symlinking FCOS settings. The repo also has OKD-specific 99okd overlay, which does the following:

  • dhclient.conf in order to prevent br-ex interface from getting a wrong MAC
  • sshd_config.d dropin to allow ssh-rsa keys to be compatible with OCP.
  • localtime symlinked to UTC (required for fluentd).
  • gcp-hostname service which uses Afterburn to set GCP hostname.

I think these could all be part of the Ignition config, no?

So yeah, it seems like the big ones are openshift-hyperkube and openshift-clients. If we can resolve those by somehow shoving them in the release payload, do we agree the rest should be solvable?

jlebon commented 3 years ago

As a concrete example, if OKD doesn't rebuild FCOS, then it becomes much easier to add Prow testing of OKD to https://github.com/coreos/fedora-coreos-config, which I think would be of great benefit to OKD (and also ties into https://github.com/coreos/fedora-coreos-tracker/issues/767; see https://github.com/coreos/fedora-coreos-tracker/issues/767#issuecomment-934448876).

vrutkovs commented 3 years ago

These are Fedora packages, right? Should be able to package layer.

iiuc that can be done only after https://hackmd.io/@darkmuggle/rJw8Cyh9_ is accepted as an enhancement, right?

gcp-hostname service which uses Afterburn to set GCP hostname.

I think these could all be part of the Ignition config, no?

That'd require us to fork installer again and/or carry more patches for MCO

jlebon commented 3 years ago

These are Fedora packages, right? Should be able to package layer.

iiuc that can be done only after hackmd.io/@darkmuggle/rJw8Cyh9_ is accepted as an enhancement, right?

Would be nice to not block on that if possible. Are all those packages needed before pivot?

gcp-hostname service which uses Afterburn to set GCP hostname.

I think these could all be part of the Ignition config, no?

That'd require us to fork installer again and/or carry more patches for MCO

Ahh right OK, that rings a bell now (sorry, I don't have a good handle on all the backstory here). Hmm, if it can't live in the installer/MCO, it seems like those projects should at least define a way for OKD to extend them at build time for what it needs.

vrutkovs commented 3 years ago

Would be nice to not block on that if possible.

Sure, but I'm not aware of any other way to create an additional layer on top of FCOS really. Previously we used rpm-ostree dev-overlay but it never actually installed RPMs properly.

Are all those packages needed before pivot?

We pivot from genuine FCOS image by extracting MCO from machine-config-image. All of these packages end up in the final OS required to start the node. However it would be useful to build OKD OS LiveCD, so that we could make use of Assisted Installer and SNO Bootstrap-in-place (all of these work with RHCOS LiveCD)

it seems like those projects should at least define a way for OKD to extend them at build time for what it needs

We have a defined way how to extend installer, but the idea is to keep changes as minimal as possible, so that we wouldn't wait for ages for lgtm and avoid filling up openshift-installer with FCOS-specific logic. No particular plan for MCO, there are bits which make it work with RHCOS and FCOS, but some PRs are pending review

travier commented 3 years ago

We can also add NetworkManager to this list (https://github.com/openshift/okd-machine-os/blob/master/NM-1.32-copr.repo) until this is resolved with F35 rebase.

openshift-bot commented 2 years ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

travier commented 2 years ago

/remove-lifecycle stale /lifecycle frozen

jlebon commented 2 years ago

https://github.com/coreos/enhancements/blob/main/os/coreos-layering.md is highly relevant here.