tinkerbell / hook

In-memory Operating System Installation Environment for Executing Tinkerbell Workflows
Apache License 2.0
101 stars 48 forks source link

multi-kernel, cross-compiling, bash based Hook & (default+foreign) kernels build (incl GHA matrix) #205

Closed rpardini closed 3 months ago

rpardini commented 5 months ago

multi-kernel, cross-compiling, bash based Hook & (default+external) kernels build (incl GHA matrix)

Original RFC below; please check commits for the (many) updates done after the first drop. Original RFC kept for reference, below the line.


RFC: multi-kernel, cross-compiling, bash based Hook & (default+external) kernels build (incl GHA matrix)

very early stage RFC

This is a rewrite of the build system. The produced default artifacts (aarch64/x86_64) should be equivalent, save for an updated 5.10.213+ kernel and arm64 fixes. It's missing, at least, documentation and linters, possibly more, that I removed and intend to rewrite. But since it's a large-ish change, I'd like to collect some feedback before continuing.

Main topics

Flavors (/kernels)

Hook's own kernels
ID Current version Description
hook-default-arm64 5.10.213 Hook's own aarch64 kernel
hook-default-amd64 5.10.213 Hook's own x86_64 kernel
Armbian kernels
ID Current version Description
armbian-bcm2711-current 6.6.22 bcm2711 (Broadcom) current, from RaspberryPi Foundation with many Armbian fixes for CNCF-landscape projects; for the RaspberryPi 3b+/4b/5
armbian-meson64-edge 6.7.10 meson64 (Amlogic) edge Khadas VIM3/3L, Radxa Zero/2, LibreComputer Potatos, and many more
armbian-rockchip64-edge 6.7.10 rockchip64 (Rockchip) edge, for many rk356x/3399 SoCs. Not for rk3588!
armbian-uefi-arm64-edge 6.8.1 Armbian generic edge UEFI kernel
armbian-uefi-x86-edge 6.8.1 Armbian generic edge UEFI kernel

Proof of working-ness?

In my fork:

image

Future possibilities:

TO-DO


Thanks for reading this far. I'm looking forward to your feedback!

chrisdoherty4 commented 5 months ago

Thanks @rpardini. Awesome OP. I've got a busy week so probably won't get to it until next weekend.

rpardini commented 5 months ago

won't get to it until next weekend.

No problem, meanwhile I keep on working -- I just added a 2nd drop.

2nd drop of rpardini's take on multi-hook

rpardini commented 4 months ago

Pushed: forced initial no-offset-limit NTP sync via busybox (fixes RaspberryPi & others without an RTC), support (WiP) for rk3588 devices & a fix for DockerHub rate limits being hit pulling linuxkit/* pkgs during Hook Linuxkit build.

Also, an initial PR for the showcase chart, which demonstrates much of what is being done here, in the charts repo: https://github.com/tinkerbell/charts/pull/89

jacobweinstock commented 4 months ago

Hey @rpardini. Thanks for this. I'm playing around with it and the x86_64 build works great. I have some non-technical concerns though. It's not clear how to cross compile HookOS for aarch64. Also, it seems like there are some envs involved for different build.sh commands but it isnt clear what all the env options available are and how to use them properly.

As this is a significant change from the status quo and in order for this to land and be maintainable we're going to need to be able to understand this better. Can you provide docs for all functionality? Most users arent going to need to or want to rebuild kernels so docs around all the options for building the final HookOS are the most important to me. Again, thanks for all this work! I think we really needed it and I want to see it land.

jacobweinstock commented 4 months ago

I think i found it. /build.sh linuxkit hook-default-arm64 works great! We'll need docs to explain this. Will also need docs around how to customize a kernel.

jacobweinstock commented 4 months ago

Also, i see in the RFC.md that you mention actuated. We have dedicated self hosted GitHub runners for both x86 and aarch64. They can be referenced in github actions via runs-on: [self-hosted, Linux, ARM64] and runs-on: [self-hosted, Linux, X64]

jacobweinstock commented 4 months ago

FYI, just got Hook 5.15 and 6.6 kernels built and booted into them! I'm liking this!

rpardini commented 4 months ago

Super thanks for the review!

Can you provide docs for all functionality?

... /build.sh linuxkit hook-default-arm64 ... We'll need docs to explain this. Will also need docs around how to customize a kernel.

Definitely! Docs are a big challenge. I hope to massage the RFC.md into README.md over time. I'm not too sure about the general CLI interface though: it takes environment variables as well as a command and optional flavor/kernel-id. I guess settling on some sane terminology also needed.

To customize a kernel: bash build.sh config-kernel hook-default-arm64 -- it will prepare a shell in Docker context where you can run make menuconfig -- it outputs some instructions, which evidently must be improved. Do you think we should have a more direct shortcut to make menuconfig && make savedefconfig & cp defconfig ... so all can done in a single step, too?

They can be referenced in github actions via runs-on: [self-hosted, Linux, ARM64] and runs-on: [self-hosted, Linux, X64]

Perfect. I'd like to keep the ability to cross-build on GH Hosted runners, for people without self-hosted runners.

What I propose is adding environment variables like RUNNER_ARM64="self-hosted, Linux, ARM64" and RUNNER_AMD64="self-hosted, Linux, X64" -- if found, the gha-matrix command will use them as values for a new runner: JSON field. If one or both are not set, it defaults to ubuntu-latest. Finally, the field is used in the GHA job matrix as runs-on: ${{matrix.runner}}. Then we can have a conditional step on the gh_org== tinkerbell to set the value; that way forks default to free runners. wdyt?

rpardini commented 4 months ago

Another large drop. I didn't squash this time, so I might have missed some sign-off's. I wrote some docs

Ref the GitHub actions runners: see the same workflow running against an org with self hosted arm64 runners, and another fork with just plain gh-hosted amd64 runners. Ended up with finer-grained control than proposed above.

rpardini commented 4 months ago

got Hook 5.15 and 6.6 kernels built and booted into them!

In the last drop I added hook-latest-lts-amd64 and hook-latest-lts-arm64, using 6.6; the defconfig was simply copied over from 5.10 and olddefconfig'ed. Works well in QEMU, but has some trouble on nvidia hardware (apparently ref nouveau / framebuffer). Could you push your 6.6 defconfig's if you have them?

jacobweinstock commented 4 months ago

Hey @rpardini , I believe i have found the issue with Docker not starting with Linuxkit v1.2.0. We need to enable cgroups v2 in the DinD container.

  - name: hook-docker
    image: "${HOOK_CONTAINER_DOCKER_IMAGE}"
    capabilities:
      - all
    net: host
    pid: host
    mounts:
      - type: cgroup2
        options: [ "rw", "nosuid", "noexec", "nodev", "relatime" ]
        destination: /sys/fs/cgroup
jacobweinstock commented 3 months ago

Hey @rpardini, if you want to fix the DCO check then I think we can merge this and I can PR follow-ups to get the other stuff. I've been using these changes for a few weeks and am quite pleased with it all.

rpardini commented 3 months ago

Hey @jacobweinstock -- I've fixed the DCOs, cleaned up a few commit messages, changed the default OCI coordinates & CI params to Tinkerbell org, and rebased onto main.

I've not touched the existing CI workflows, though; feel free to push to my branch if fixes are needed before merge.

Thanks so much for the reviews!