okd-project / okd

The self-managing, auto-upgrading, Kubernetes distribution for everyone
https://okd.io
Apache License 2.0
1.67k stars 289 forks source link

Unable to use CoreOS layering, new node doesn't start: cannot open root device #1908

Open jacksgt opened 3 months ago

jacksgt commented 3 months ago

Hello,

I'm trying to use CoreOS Layering with FCOS on a OKD 4.14 cluster as described here: https://docs.okd.io/4.14/post_installation_configuration/coreos-layering.html

I build a container image with the following Dockerfile/Containerfile:

# Original machine OS image for OKD 4.14
FROM quay.io/openshift/okd-content@sha256:0351023b7ac334780e4e9a07d8f732d9a6b7004807e6acf4467520b5e651aa57

RUN rpm-ostree cliwrap install-to-root / && \
    rpm-ostree override replace https://kojipkgs.fedoraproject.org/packages/kernel/6.4.15/200.fc38/x86_64/kernel-{,core-,modules-,modules-core-,modules-extra-}6.4.15-200.fc38.x86_64.rpm && \
    rpm-ostree cleanup -m && \
    ostree container commit

Build log (successful):

Status: Downloaded newer image for quay.io/openshift/okd-content@sha256:0351023b7ac334780e4e9a07d8f732d9a6b7004807e6acf4467520b5e651aa57
 ---> ba4053510ccd
Step 2/2 : RUN rpm-ostree cliwrap install-to-root / &&     rpm-ostree override replace https://kojipkgs.fedoraproject.org/packages/kernel/6.4.15/200.fc38/x86_64/kernel-{,core-,modules-,modules-core-,modules-extra-}6.4.15-200.fc38.x86_64.rpm &&     rpm-ostree cleanup -m &&     ostree container commit
 ---> Running in f499428e3fa2
Successfully enabled cliwrap for /
Downloading https://kojipkgs.fedoraproject.org/packages/kernel/6.4.15/200.fc38/x86_64/kernel-6.4.15-200.fc38.x86_64.rpm...done
Downloading https://kojipkgs.fedoraproject.org/packages/kernel/6.4.15/200.fc38/x86_64/kernel-core-6.4.15-200.fc38.x86_64.rpm...done
Downloading https://kojipkgs.fedoraproject.org/packages/kernel/6.4.15/200.fc38/x86_64/kernel-modules-6.4.15-200.fc38.x86_64.rpm...done
Downloading https://kojipkgs.fedoraproject.org/packages/kernel/6.4.15/200.fc38/x86_64/kernel-modules-core-6.4.15-200.fc38.x86_64.rpm...done
Downloading https://kojipkgs.fedoraproject.org/packages/kernel/6.4.15/200.fc38/x86_64/kernel-modules-extra-6.4.15-200.fc38.x86_64.rpm...done
Enabled rpm-md repositories: cri-o_1.27 cri-tools updates fedora-cisco-openh264 updates-modular fedora-modular fedora updates-archive
Updating metadata for 'cri-o_1.27'...done
Updating metadata for 'cri-tools'...done
Updating metadata for 'updates'...done
Updating metadata for 'fedora-cisco-openh264'...done
Updating metadata for 'updates-modular'...done
Updating metadata for 'fedora-modular'...done
Updating metadata for 'fedora'...done
Updating metadata for 'updates-archive'...done
Importing rpm-md...done
rpm-md repo 'cri-o_1.27'; generated: 2023-05-05T17:45:54Z solvables: 8
rpm-md repo 'cri-tools'; generated: 2023-09-05T20:58:11Z solvables: 27
rpm-md repo 'updates'; generated: 2024-03-22T01:37:05Z solvables: 32457
rpm-md repo 'fedora-cisco-openh264'; generated: 2023-12-12T17:23:34Z solvables: 4
rpm-md repo 'updates-modular'; generated: 2024-03-07T01:02:11Z solvables: 1087
rpm-md repo 'fedora-modular'; generated: 2023-04-13T20:30:47Z solvables: 1082
rpm-md repo 'fedora'; generated: 2023-04-13T20:37:10Z solvables: 69222
rpm-md repo 'updates-archive'; generated: 2024-03-15T09:56:44Z solvables: 69656
Resolving dependencies...done
Installing 5 packages:
  kernel-6.4.15-200.fc38.x86_64 (@commandline)
  kernel-core-6.4.15-200.fc38.x86_64 (@commandline)
  kernel-modules-6.4.15-200.fc38.x86_64 (@commandline)
  kernel-modules-core-6.4.15-200.fc38.x86_64 (@commandline)
  kernel-modules-extra-6.4.15-200.fc38.x86_64 (@commandline)
Downgrading: kernel-modules-core;6.4.15-200.fc38;x86_64;local
Downgrading: kernel-core;6.4.15-200.fc38;x86_64;local
Downgrading: kernel-modules;6.4.15-200.fc38;x86_64;local
Installing: kernel-modules-extra-6.4.15-200.fc38.x86_64 (local)
Downgrading: kernel;6.4.15-200.fc38;x86_64;local
Cleanup: kernel;6.5.5-200.fc38;x86_64;installed
Cleanup: kernel-modules;6.5.5-200.fc38;x86_64;installed
Cleanup: kernel-modules-core;6.5.5-200.fc38;x86_64;installed
Cleanup: kernel-core;6.5.5-200.fc38;x86_64;installed
Removing intermediate container f499428e3fa2
 ---> d7d498db3749
Successfully built d7d498db3749

Afterwards, I'm applying the custom image with a simple MachineConfig to the nodes:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 99-custom-machine-os
spec:
  osImageURL: registry.example.com/my-custom-image:kernel-6.4.15

The MachineConfig gets picked up correctly and merged into rendered-worker-xxxxx.

However, when I create a new node the node ends up in the following state:

[    0.994945] /dev/root: Can't open blockdev
[    0.995705] VFS: Cannot open root device "UUID=826a82d1-487b-43bd-a26e-3c2c2264189c" or unknown-block(0,0): error -6
[    0.997464] Please append a correct "root=" boot option; here are the available partitions:
[    0.998937] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)

This is 100% reproducible.

Does anyone know what I'm missing or what could cause this?

Here's the full log of the node (including the initial boot, which works fine): standard-tmlb7.log

OKD version: 4.14.0-0.okd-2024-01-26-175629

K1kc4 commented 3 months ago

You need initramfs.img for new kernel version.

Before 4.15

/usr/libexec/rpm-ostree/wrapped/dracut --no-hostonly --kver ${KERNEL_VERSION}-${KERNEL_BUILD_VERSION}.x86_64 --reproducible -v --add ostree -f "/usr/lib/modules/${KERNEL_VERSION}-${KERNEL_BUILD_VERSION}.x86_64/initramfs.img"; \

4.15 +

/usr/bin/dracut --no-hostonly --kver ${KERNEL_VERSION}-${KERNEL_BUILD_VERSION}.x86_64 --reproducible -v --add ostree -f "/usr/lib/modules/${KERNEL_VERSION}-${KERNEL_BUILD_VERSION}.x86_64/initramfs.img"; \
jacksgt commented 3 months ago

Hi,

thanks for your suggestion. I modified the Containerfile as follows:

FROM quay.io/openshift/okd-content@sha256:0351023b7ac334780e4e9a07d8f732d9a6b7004807e6acf4467520b5e651aa57

RUN set -x && rpm-ostree cliwrap install-to-root / && \
    rpm-ostree override replace https://kojipkgs.fedoraproject.org/packages/kernel/6.4.15/200.fc38/x86_64/kernel-{,core-,modules-,modules-core-}6.4.15-200.fc38.x86_64.rpm && \
    /usr/libexec/rpm-ostree/wrapped/dracut --no-hostonly --kver 6.4.15-200.x86_64 --reproducible -v --add ostree -f "/usr/lib/modules/6.4.15-200.x86_64/initramfs.img" && \
    rpm-ostree cleanup -m && \
    ostree container commit

Unfortunately the build ends up in a (infinite) loop when invoking dracut:

tep 2/2 : RUN set -x && rpm-ostree cliwrap install-to-root / &&     rpm-ostree override replace https://kojipkgs.fedoraproject.org/packages/kernel/6.4.15/200.fc38/x86_64/kernel-{,core-,modules-,modules-core-}6.4.15-200.fc38.x86_64.rpm &&     /usr/libexec/rpm-ostree/wrapped/dracut --no-hostonly --kver 6.4.15-200.x86_64 --reproducible -v --add ostree -f "/usr/lib/modules/6.4.15-200.x86_64/initramfs.img" &&     rpm-ostree cleanup -m &&     ostree container commit
 ---> Running in 622b6f27c395
+ rpm-ostree cliwrap install-to-root /
Successfully enabled cliwrap for /
+ rpm-ostree override replace https://kojipkgs.fedoraproject.org/packages/kernel/6.4.15/200.fc38/x86_64/kernel-6.4.15-200.fc38.x86_64.rpm https://kojipkgs.fedoraproject.org/packages/kernel/6.4.15/200.fc38/x86_64/kernel-core-6.4.15-200.fc38.x86_64.rpm https://kojipkgs.fedoraproject.org/packages/kernel/6.4.15/200.fc38/x86_64/kernel-modules-6.4.15-200.fc38.x86_64.rpm https://kojipkgs.fedoraproject.org/packages/kernel/6.4.15/200.fc38/x86_64/kernel-modules-core-6.4.15-200.fc38.x86_64.rpm
Downloading https://kojipkgs.fedoraproject.org/packages/kernel/6.4.15/200.fc38/x86_64/kernel-6.4.15-200.fc38.x86_64.rpm...done
Downloading https://kojipkgs.fedoraproject.org/packages/kernel/6.4.15/200.fc38/x86_64/kernel-core-6.4.15-200.fc38.x86_64.rpm...done
Downloading https://kojipkgs.fedoraproject.org/packages/kernel/6.4.15/200.fc38/x86_64/kernel-modules-6.4.15-200.fc38.x86_64.rpm...done
Downloading https://kojipkgs.fedoraproject.org/packages/kernel/6.4.15/200.fc38/x86_64/kernel-modules-core-6.4.15-200.fc38.x86_64.rpm...done
Enabled rpm-md repositories: cri-o_1.27 cri-tools updates fedora-cisco-openh264 updates-modular fedora-modular fedora updates-archive
Updating metadata for 'cri-o_1.27'...done
Updating metadata for 'cri-tools'...done
Updating metadata for 'updates'...done
Updating metadata for 'fedora-cisco-openh264'...done
Updating metadata for 'updates-modular'...done
Updating metadata for 'fedora-modular'...done
Updating metadata for 'fedora'...done
Updating metadata for 'updates-archive'...done
Importing rpm-md...done
rpm-md repo 'cri-o_1.27'; generated: 2023-05-05T17:45:54Z solvables: 8
rpm-md repo 'cri-tools'; generated: 2023-09-05T20:58:11Z solvables: 27
rpm-md repo 'updates'; generated: 2024-03-26T02:37:40Z solvables: 32560
rpm-md repo 'fedora-cisco-openh264'; generated: 2023-12-12T17:23:34Z solvables: 4
rpm-md repo 'updates-modular'; generated: 2024-03-07T01:02:11Z solvables: 1087
rpm-md repo 'fedora-modular'; generated: 2023-04-13T20:30:47Z solvables: 1082
rpm-md repo 'fedora'; generated: 2023-04-13T20:37:10Z solvables: 69222
rpm-md repo 'updates-archive'; generated: 2024-03-15T09:56:44Z solvables: 69656
Resolving dependencies...done
Installing 4 packages:
  kernel-6.4.15-200.fc38.x86_64 (@commandline)
  kernel-core-6.4.15-200.fc38.x86_64 (@commandline)
  kernel-modules-6.4.15-200.fc38.x86_64 (@commandline)
  kernel-modules-core-6.4.15-200.fc38.x86_64 (@commandline)
Downgrading: kernel-modules-core;6.4.15-200.fc38;x86_64;local
Downgrading: kernel-core;6.4.15-200.fc38;x86_64;local
Downgrading: kernel-modules;6.4.15-200.fc38;x86_64;local
Downgrading: kernel;6.4.15-200.fc38;x86_64;local
Cleanup: kernel;6.5.5-200.fc38;x86_64;installed
Cleanup: kernel-modules;6.5.5-200.fc38;x86_64;installed
Cleanup: kernel-modules-core;6.5.5-200.fc38;x86_64;installed
Cleanup: kernel-core;6.5.5-200.fc38;x86_64;installed
+ /usr/libexec/rpm-ostree/wrapped/dracut --no-hostonly --kver 6.4.15-200.x86_64 --reproducible -v --add ostree -f /usr/lib/modules/6.4.15-200.x86_64/initramfs.img
This system is rpm-ostree based; initramfs handling is
integrated with the underlying ostree transaction mechanism.
Use `rpm-ostree initramfs` to control client-side initramfs generation.
rpm-ostree: Note: This system is image (rpm-ostree) based.
rpm-ostree: Dropping privileges as `dracut` was executed with not "known safe" arguments.
rpm-ostree: You may invoke the real `dracut` binary in `/usr/libexec/rpm-ostree/wrapped/dracut`.
rpm-ostree: Continuing execution in 5 seconds.
This system is rpm-ostree based; initramfs handling is
integrated with the underlying ostree transaction mechanism.
Use `rpm-ostree initramfs` to control client-side initramfs generation.
rpm-ostree: Note: This system is image (rpm-ostree) based.
rpm-ostree: Wrapped binary "dracut" was executed with not "known safe" arguments.
rpm-ostree: You may invoke the real `dracut` binary in `/usr/libexec/rpm-ostree/wrapped/dracut`.
rpm-ostree: Continuing execution in 5 seconds.
This system is rpm-ostree based; initramfs handling is
integrated with the underlying ostree transaction mechanism.
Use `rpm-ostree initramfs` to control client-side initramfs generation.
rpm-ostree: Note: This system is image (rpm-ostree) based.
rpm-ostree: Wrapped binary "dracut" was executed with not "known safe" arguments.
rpm-ostree: You may invoke the real `dracut` binary in `/usr/libexec/rpm-ostree/wrapped/dracut`.
rpm-ostree: Continuing execution in 5 seconds.
...

What am I missing?

K1kc4 commented 3 months ago

Hello,

Its because of cliwrap, try just override + dracut.

jacksgt commented 3 months ago

I can confirm that the following Dockerfile builds successfully and the nodes boot correctly with the new kernel:

FROM quay.io/openshift/okd-content@sha256:0351023b7ac334780e4e9a07d8f732d9a6b7004807e6acf4467520b5e651aa57

RUN set -x && \
    rpm-ostree override replace https://kojipkgs.fedoraproject.org/packages/kernel/6.4.15/200.fc38/x86_64/kernel-{,core-,modules-,modules-core-}6.4.15-200.fc38.x86_64.rpm && \
     /usr/libexec/rpm-ostree/wrapped/dracut --no-hostonly --kver 6.4.15-200.fc38.x86_64 --reproducible -v --add ostree -f "/usr/lib/modules/6.4.15-200.fc38.x86_64/initramfs.img" && \
    rpm-ostree cleanup -m && \
    ostree container commit

Now the question is: why is this necessary in the first place? I thought the point of cliwrap was to intercept these kernel scripts appropriately?

(2) Enables cliwrap. This is currently required to intercept some command invocations made from kernel scripts.

https://docs.okd.io/4.14/post_installation_configuration/coreos-layering.html#coreos-layering-configuring_coreos-layering