rancher / k3os

Purpose-built OS for Kubernetes, fully managed by Kubernetes.
https://k3os.io
Apache License 2.0
3.5k stars 396 forks source link

grub-reboot not functioning as intended #804

Open bitmage opened 2 years ago

bitmage commented 2 years ago

Version:

NAME="k3OS"
VERSION="k3OS v0.11.1"
ID=k3os
ID_LIKE=alpine
PRETTY_NAME="k3OS v0.11.1"
VERSION_ID="v0.11.1"
HOME_URL="https://k3os.io/"
SUPPORT_URL="https://k3os.io/"
BUG_REPORT_URL="https://github.com/rancher/k3os/issues"
ISO_URL="https://github.com/rancher/k3os/releases/download/v0.11.1/k3os-amd64.iso"

Kernel:

5.4.0-48-generic #52 SMP Sat Sep 26 08:27:15 UTC 2020

Machine:

x86_64
Dell Optiplex 9010 SFF 3.40GHz i5-3570

Intention

My intention is to remotely upgrade the operating system on a cluster of 12 machines. If grub-reboot works properly it will allow me to instruct via SSH for the machine to reboot once-only into a system upgrade, as described in the k3OS kernel command line documentation. Once I'm satisfied with the behavior I can automate it using a shell script.

Defect Summary

In my experience so far, grub-reboot sets the next_entry, viewable from grub-editenv list, but the machine does not actually respect this setting on the next boot.

Steps

I did the following:

  1. Add a new menu option to /boot/grub/grub.cfg (see Additional Information below for full text).
  2. Modify the first line of /boot/grub/grub.cfg to read set default=saved.
  3. grub-set-default 0
  4. grub-reboot 4
  5. reboot

Observation

Machine reboots, comes to the grub menu and the first option "0" is still selected, and boots after a few seconds.

Expected

It should boot from the fifth option "4". (zero indexed)

Additional Information

grub-editenv list prints out:

saved_entry=0
next_entry=4

So I know the preference is being saved, but it's not being respected on reboot. To my knowledge and awareness the set default=saved on line 1 of /boot/grub/grub.cfg should instruct grub to read from this environment value. I have also modified /etc/default/grub to add GRUB_DEFAULT=saved, but this parameter also does not affect the boot behavior.

On reboot also I notice that next_entry is not modified... So grub does not appear to be interacting with this in any way.

There are different versions and ways to configure grub. The articles I'm reading on this subject have not so far pointed me to how the k3OS configuration might be different, and what would need to be adapted for the desired outcome to be achieved. Appreciate any insights or adjustments that I could try.

bitmage commented 2 years ago

Additional Configuration

I don't think these are relevant for troubleshooting the grub-reboot issue. But I'll post them here for anyone else wanting to follow along or replicate this process.

grub.cfg modifications

These lines are appended to the grub.cfg with the intention of providing a remote upgrade option. The kernel parameters are set as directed in the k3OS Installation Instructions.

menuentry "k3OS Upgrade" {
  search.fs_label K3OS_STATE root
  set sqfile=/k3os/system/kernel/previous/kernel.squashfs
  loopback loop0 /$sqfile
  set root=($root)
  linux (loop0)/vmlinuz k3os.mode=install k3os.install.silent=true k3os.install.device=/dev/sda k3os.install.config_url=http://192.168.1.2:1313/agent.yaml k3os.install.iso_url=http://192.168.1.2:1313/iso/k3os-amd64-v0.22.2-k3s2r0.iso
  initrd /k3os/system/kernel/previous/initrd
}

In addition the first line of the file is set to:

set default=saved

Which should instruct grub to read from the parameters described in the issue above.

agent.yaml

This file is served from my laptop using node-static and the command static . -p 1313 -a 192.168.1.2. It's ephemeral - only served during the installation.

ssh_authorized_keys:
- ssh-rsa [redacted]
- bitmage@[redacted]

# change this manually with each install
hostname: laba2

k3os:
  dns_nameservers:
  - 192.168.1.1
  - 8.8.8.8
  - 1.1.1.1
  ntp_servers:
  - 0.us.pool.ntp.org
  - 1.us.pool.ntp.org
  password: [redacted]
  server_url: https://192.168.1.10:6443
  token: [redacted]
  k3s_args:
    - agent

The install ISO listed in the command line parameters is being served by the same node-static instance.

If everything were working properly, this should allow remote flashing of the operating system followed by a reboot into the default grub menu "0" which would boot the new OS normally.