rancher / k3os

Purpose-built OS for Kubernetes, fully managed by Kubernetes.
https://k3os.io
Apache License 2.0
3.5k stars 397 forks source link

Is PXE boot supported in K3OS? #56

Open deminngi opened 5 years ago

deminngi commented 5 years ago

I'd like to Wake-on-LAN and PXE boot the K3OS using U-Boot is there a way how to do this?

bjwschaap commented 5 years ago

We'd like to PXE boot K3OS without rootfs on NFS as well. Right now it seems the scripting is hardcoded to find a block device with label 'K3OS', and tries to mount it as /.base. And I haven't been able to find any way to tell the kernel/init to use the ISO file over http(s).

MindTooth commented 5 years ago

I have to ask; have you been able to boot it in its vanilla state?

Been tinkering with the idea of submitting to netboot.xyz, but I never came passed the root= error. netboot.xyz uses iPXE.

bjwschaap commented 5 years ago

@MindTooth nope, not unfortunately. I even tried the blunt method of 'live' booting the ISO using memdisk. But still... it fails on https://github.com/rancher/k3os/blob/59cb901396ab9a410304572c4edf426e112a4439/overlay/libexec/k3os/live#L15 which sucks because it can only handle disks with label 'K3OS'. This of course isn't there when booting over PXE... We decided that k3s/k3os is still way too experimental for us, and switched to linuxkit + k8s (https://github.com/linuxkit/kubernetes).

MindTooth commented 4 years ago

Fixed with 3.0?

hbokh commented 4 years ago

Still testing, but I managed to PXEboot k3OS v0.7.1 just fine. Fetched the initrd & kernel in the folder <tftproot>/k3os/v0.7.1/amd64/:

wget https://github.com/rancher/k3os/releases/download/v0.7.1/k3os-initrd-amd64
wget https://github.com/rancher/k3os/releases/download/v0.7.1/k3os-vmlinuz-amd64

Added this in pxelinux.cfg/default:

LABEL k3OS
  KERNEL k3os/v0.7.1/amd64/k3os-vmlinuz-amd64
  APPEND initrd=k3os/v0.7.1/amd64/k3os-initrd-amd64 k3os.mode=install k3os.install.iso_url=http://192.168.10.5/k3os/k3os-amd64.iso k3os.install.config_url=http://192.168.10.5/k3os-cloud-config k3os.install.silent=true k3os.install.device=/dev/sda

The ISO-image k3os-amd64.iso and cloud-config are also on that same boot server.

bhale commented 4 years ago

@hbokh Would you mind making a pull request to add some docs for how you setup the PXE server and anything pertinent you had to do on the client specific to k3os?

hbokh commented 4 years ago

Setting up a PXE bootserver is in a different league that does not belong here IMHO. Besides, my setup is kind of different, spread over multiple devices. There are good articles to be found if you google a bit. This might get you started to get an idea: https://github.com/paulmaunders/TFTP-PXE-Boot-Server

metahertz commented 4 years ago

From what I can see in the code, going back to v0.3.0, the init process to look for k3os.install.iso_url and download the ISO if booted from kernel+initrd (ie netboot) has never been implemented.

The only places k3os.install.iso_url are referred to are in the docs, and in install.sh, which will honor the URL for the install process.

From the docs: cmdline Default Example Description
k3os.install.iso_url https://github.com/rancher/k3os/../k3os-amd64.iso ISO to download and install from if booting from kernel/vmlinuz and not ISO.

Also, the behavior of booting different versions via iPXE with kernel+initrd has changed through the versions, with v0.9.0 and v0.8.0 dropping to a bash shell with /usr/libexec/k3os/boot: line 55 command k3os not found after failing to mount to /.base as mentioned by @bjwschaap above, where as v0.7.1 and below allow booting past the error, but still in an initramfs state where the ISO/Squashfs is never download and mounted, yet it "looks" like a "full k3os" successful boot, you get a k3os-XYZ login prompt, but none of the things you'd expect, such as kubectl, which is symlinked at /usr/sbin/kubectl > /usr/bin/k3s which itself is symlinked to a non-existent mount from our missing root /k3os/system/k3s/current/k3s (or at least was in version I was netbooted into at the time < v0.7.1)

This may explain why @hbokh said 0.7.1 netboots fine, but I have a strong suspicion based on the above that it would never have been usable.

What you do get, however, is install.sh, so arguably the older versions gave you "enough K3OS" via netboot to run the install.sh, which does honor the remote ISO location, however, this didn't seem to work as normal, I could run install.sh manually but could not get my cloud-init config to make a silent install happen.

I focused on 0.9.0, as > 0.8.0 are clearly now dependent on more of the "real root" binaries from earlier in the boot anyway.

I've created a small patch and tested in my environment which allows me to netboot to a successful silent install of v0.9.0.

IMG_1097

If anyone else wants to test artefacts are here: https://github.com/metahertz/k3os/releases/tag/v0.9.0-issue56

Will raise a PR in order to get thoughts on the "proper" way of fixing this, as I'm concerned the k3os.install.iso_url flag is likely used outside of it's original description by a lot of people that may not want this download behavior.

leigh-j commented 4 years ago

managed to get an ipxe boot working, this is the config used https://github.com/leigh-j/k3os-ipxe/blob/master/boot.ipxe

vanakema commented 4 years ago

Has anyone successfully gotten a PXE or iPXE boot as a live-server or live-client?

If I had to guess, the issue has something to do with the above proposed methods not providing the rootfs image available here.

However, in my searching (and lack of deep knowledge about the boot process of linux, and especially k3os), I've been unable to figure out how you're supposed to provide that rootfs file (and what exactly that file is). I'm especially lost since the k3os has that interesting/weird partitioning of configuration data and boot data.

fbettag commented 3 years ago

after playing around with k3os in general for a weekend to get acquainted with kubernetes, and having PXE'ified lots of xen setups over the past 10 years, i think there is only one key component missing: being able to give k3os.config as kernel param. That way you could just boot into pxe with k3os.mode=live as described in @leigh-j's gist (that's what i just did) and start pods found with whatever is specified in the config to mount to /persistent (or wherever).

vanakema commented 3 years ago

@fbettag Yeah that’s the issue I ran into too, not being able to boot into live mode via PXE. I’d love to be able to do that where I can just turn any NUC on at home and have it automatically join the cluster, without needing to install first. I can use my NAS for anything with persistence

fbettag commented 3 years ago

@fbettag Yeah that’s the issue I ran into too, not being able to boot into live mode via PXE. I’d love to be able to do that where I can just turn any NUC on at home and have it automatically join the cluster, without needing to install first. I can use my NAS for anything with persistence

Yes it boots to mode=live via pxe, just a way to specify the config in ipxe is missing

vanakema commented 3 years ago

@fbettag

Yes it boots to mode=live via pxe, just a way to specify the config in ipxe is missing

When I used settings similar to the iPXE install, but for live, yes it “boots” technically, but no k3s commands are available essentially besides install. Only reason install works is because it pulls the iso from http and the install script itself is available from initramfs I believe

@metahertz explained the symptoms as

still in an initramfs state where the ISO/Squashfs is never download and mounted, yet it "looks" like a "full k3os" successful boot, you get a k3os-XYZ login prompt, but none of the things you'd expect, such as kubectl which is exactly the issue I was having.

Normally live boot walks you through choosing live boot options (ex. Master or Worker mode) Have you gotten that to show up? Have you tried running any kubectl commands?

I’m no Linux wiz, but I’m pretty sure they’re all in the rootfs file which none of the currently proposed pxe configs explain how to use.

Have you figured out that part? I’d love to hear how you did it!

FruityWelsh commented 2 years ago

Got a start of a success on this. This is more a record of what I did then a real script, but it does work as a script

#!/bin/sh

set -o errexit
set -o nounset
#set -o xtrace

K3OS_VERSION="v0.11.0"
CUSTOM_CONFIG="$1"
NEW_INITRD_IMG="$2"

cleanup() {
        sudo umount /tmp/k3os-build/mounted-iso
        rm -rf /tmp/k3os-build
}
trap cleanup EXIT

mkdir /tmp/k3os-build
cp ${CUSTOM_CONFIG} /tmp/k3os-build/config.yaml
wget -O /tmp/k3os-build/k3os-vmlinuz-amd64 https://github.com/rancher/k3os/releases/download/${K3OS_VERSION}/k3os-vmlinuz-amd64
wget -O /tmp/k3os-build/k3os-amd64.iso https://github.com/rancher/k3os/releases/download/v0.11.0/k3os-amd64.iso
mkdir /tmp/k3os-build/mounted-iso
sudo mount -o loop /tmp/k3os-build/k3os-amd64.iso /tmp/k3os-build/mounted-iso
mkdir /tmp/k3os-build/imagefile
cd /tmp/k3os-build/imagefile
wget -qO- https://github.com/rancher/k3os/releases/download/${K3OS_VERSION}/k3os-initrd-amd64 | gzip -cd | cpio -imd --quiet
cp -r /tmp/k3os-build/mounted-iso/k3os/system/k3s /tmp/k3os-build/imagefile/k3os/system/k3s
## FIXME adding the kernel dir causes pxe to fail to load the initrd at all
#cp -r /tmp/k3os-build/mounted-iso/k3os/system/kernel /tmp/k3os-build/imagefile/k3os/system/kernel
cp /tmp/k3os-build/config.yaml /tmp/k3os-build/imagefile/k3os/system/config.yaml
find . | cpio -H newc -o | gzip -9 -n > ${NEW_INITRD_IMG}

CUSTOM_CONFIG example (could not include custom hostname because of duplicate hostname issue):

sh_authorized_keys:
  - ssh-rsa <SSH-KEYS>
k3os:
  server_url: https://<SERVER_IP>:6443
  ntp_servers:
    - 0.us.pool.ntp.org
    - 1.us.pool.ntp.org
  password: <RANCHER_USER_PASSWORD>
  token: <SERVER_TOKEN>
  k3s_args:
    - "agent"

pxelinux.cfg/default entry:

label k3os
        MENU LABEL k3os v0.11.0 amd-64
        KERNEL disks/k3os/v0.11.0/amd64/k3os-vmlinuz-amd64
        APPEND initrd=disks/k3os/v0.11.0/amd64/k3os-initrd-amd64-custom load_ramdisk=1 k3os.mode=live

All this to say, it enrolls an agent (that will need to deleted if rebooted since the node is ephemeral), but will enter Ready status because of kubelet invalid capacity 0 on image filesystem error when trying to start the kubelet. Running sudo k3s crictl stats fails because /run/k3s/containerd/containerd.sock: connect: connection refused, and finally the /var/log/k3-service.log has a repeated error about /proc/sys/net/netfilter/nf_contrack_max: no such file or directory. Adding the kernel dir fails to fix this.

Edit: (cleaned up script some)