tinkerbell / hook

In-memory Operating System Installation Environment for Executing Tinkerbell Workflows
Apache License 2.0
101 stars 48 forks source link

rpardini's May'24 fix batch: slim for 2Gb RAM devices #225

Closed rpardini closed 3 months ago

rpardini commented 3 months ago
  • recent userspace additions took the initramfs size near or over the 900mb mark for certain kernels.
  • initramfs (gzipped cpio) is uncompressed by bootloader and mounted on tmpfs by kernel.
  • tmpfs allows only 50% of physical RAM by default, and default can't be changed easily.
  • slim down both the userspace (by stripping / removing some / etc) and the Armbian kernels (by removing modules)
  • with those we're back below 900mb uncompressed again, and the default x86 hook tarball is down from 223 to 180mb compressed.
  • add a check for uncompressed cpio size at 900mb; warn in GHA if it is ever hit again.
  • also includes: fixes for ttyAML consoles, better logging, some dev/debug options used for batch

note: review is easier if done commit-by-commit; sent a large batch due to same-line changes across them


build: common: better logging & emit notice/warn/error also to GHA workflow commands

kernel: armbian: fix: use ORAS binary appropriate to the (host) arch; bump ORAS to 1.2.0-rc.1 (from beta.1)

build: introduce OUTPUT_TARBALL_FILELIST=yes to include LK's --format tar output and its filelist

kernel: armbian: ensure kernel.tar contains entry for the / (root) directory

kernel: armbian: don't flood output with tar's verbose option

kernel: armbian: remove some heavy kernel modules (so it fits in 2Gb RAM)

images: slim down golang binaries, by building without DWARF/debug symbols, stripping prebuilts, and removing unneeded bins

hook: add handling for ttyAML0/1 (used on Amlogic SoCs)

build: introduce check for initramfs size > 900Mb and warn/notice

rpardini commented 3 months ago

Can't for the life of me figure out why some build checks are failing with failed to write compressed diff: failed to create diff tar stream: context canceled at the very end, after building the whole Dockerfile, it seems to fail exporting it to normal Docker image store -- seems to be runner-specific? Other (very similar) kernels build fine....

jacobweinstock commented 3 months ago

Can't for the life of me figure out why some build checks are failing with failed to write compressed diff: failed to create diff tar stream: context canceled at the very end, after building the whole Dockerfile, it seems to fail exporting it to normal Docker image store -- seems to be runner-specific? Other (very similar) kernels build fine....

Hey @rpardini , yeah these are runner-specific. I think related to concurrency on a single runner. I'm investigating. For the moment a re-run should resolve them. I re-ran the failed ones and all passed except Hook armbian-rk35xx-vendor which i think might have an actual issue. I'd need your eyes on the output to confirm. Thanks for these updates!

rpardini commented 3 months ago

except Hook armbian-rk35xx-vendor which i think might have an actual issue

A bit of bad luck: since we're re-running failed jobs, we're subject to upstream changes in between reruns. In this case, Armbian released a new version, but our "Kernel" job had already run successfully, while the Hook that uses it didn't.

The same would happen to default/lts Hooks if kernel.org released a new point release at around that time.

I'll push again to force a full rebuild.

rpardini commented 3 months ago

I'll push again to force a full rebuild.

Worked! Thanks @jacobweinstock

rpardini commented 3 months ago

Tested both arm64 and amd64 run (qemu) with the uefi-arm64 and uefi-x86 Armbian kernels. I've set qemu with 2Gb RAM from the beginning, but since those were huge kernels, they never really worked before. Now they do. This opens up the road for adding say Ubuntu linux-generic kernels in the future.

I've tested the meson64 Hook with a 2Gb Amlogic Meson GXM (an S912 tvbox!) device successfully. For Amlogics have this ttyAML console, instead of ttyAMA.

When hitting the root tmpfs 50% limit (which is around 1Gb - ), the in-memory rootfs is left half-populated, and stuff that is added last to the cpio by LinuxKit (eg: /etc/os-release) is missing, and cause all kinds of very strange errors; stuff like getty tries to come up fails mounting overlays, cgroup stuff fails, etc.

This led me down wild chases over permissions, xino attrs, obscure debug options, ...

The key to finding this was the very sneaky kernel message initramfs: unpacking failed: write error. You'd think this would be a panic, but no, the boot continues as if nothing had happened.

jacobweinstock commented 3 months ago

Tested both arm64 and amd64 run (qemu) with the uefi-arm64 and uefi-x86 Armbian kernels. I've set qemu with 2Gb RAM from the beginning, but since those were huge kernels, they never really worked before. Now they do. This opens up the road for adding say Ubuntu linux-generic kernels in the future.

I've tested the meson64 Hook with a 2Gb Amlogic Meson GXM (an S912 tvbox!) device successfully. For Amlogics have this ttyAML console, instead of ttyAMA.

When hitting the root tmpfs 50% limit (which is around 1Gb - ), the in-memory rootfs is left half-populated, and stuff that is added last to the cpio by LinuxKit (eg: /etc/os-release) is missing, and cause all kinds of very strange errors; stuff like getty tries to come up fails mounting overlays, cgroup stuff fails, etc.

This led me down wild chases over permissions, xino attrs, obscure debug options, ...

The key to finding this was the very sneaky kernel message initramfs: unpacking failed: write error. You'd think this would be a panic, but no, the boot continues as if nothing had happened.

oh very interesting. nice find!

jacobweinstock commented 3 months ago

btw, the reduction in size is quite nice! looks like over 25% or so! great work!