kairos-io / kairos

:penguin: The immutable Linux meta-distribution for edge Kubernetes.
https://kairos.io
Apache License 2.0
1.12k stars 97 forks source link

baremetal machines with 4096B sector size disks no longer partition correctly #2964

Open ChrisPbb opened 2 days ago

ChrisPbb commented 2 days ago

Kairos version:

PRETTY_NAME="Ubuntu 22.04.5 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.5 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
KAIROS_ID_LIKE="kairos-core-ubuntu-22.04"
KAIROS_ARTIFACT="kairos-ubuntu-22.04-core-amd64-generic-v3.2.1"
KAIROS_FLAVOR="ubuntu"
KAIROS_RELEASE="v3.2.1"
KAIROS_REGISTRY_AND_ORG="quay.io/kairos"
KAIROS_ID="kairos"
KAIROS_NAME="kairos-core-ubuntu-22.04"
KAIROS_FLAVOR_RELEASE="22.04"
KAIROS_FAMILY="ubuntu"
KAIROS_GITHUB_REPO="kairos-io/kairos"
KAIROS_VERSION="v3.2.1"
KAIROS_IMAGE_LABEL="22.04-core-amd64-generic-v3.2.1"
KAIROS_TARGETARCH="amd64"
KAIROS_SOFTWARE_VERSION_PREFIX="k3s"
KAIROS_PRETTY_NAME="kairos-core-ubuntu-22.04 v3.2.1"
KAIROS_IMAGE_REPO="quay.io/kairos/ubuntu:22.04-core-amd64-generic-v3.2.1"
KAIROS_MODEL="generic"
KAIROS_BUG_REPORT_URL="https://github.com/kairos-io/kairos/issues"
KAIROS_HOME_URL="https://github.com/kairos-io/kairos"
KAIROS_VERSION_ID="v3.2.1"
KAIROS_VARIANT="core"

CPU architecture, OS, and Version:

Linux XXXXX 5.15.0-122-generic #132-Ubuntu SMP Thu Aug 29 13:45:52 UTC 2024 X86_64 x86_64 x86_64
GNU/Linux

Describe the bug The install process creates partition tables as if the sector size was 512B even when the sector size is 4096B. This does not happen with v3.1.1

To Reproduce Install kairos on a disk device with sector size 4096B. Any configuration (except install.no-format: true) will produce the same results.

Expected behavior Partitions are created with sector size matching the disk device's physical block size.

Logs Here's the actual error during install (OCRd from console screenshot)

2024-10-23T14:42:37Z INF Done executing stage 'kairos-install.pre'
2024-10-23T14:42:37Z INF Running stage: kairos-install.pre.after[
2024-10-23714:42:37Z INF Done executing stage 'kairos-install.pre.after'
2024-10-23T14:42:372 INF Running stage: kairos-install.pre.before
2024-10-23T14:42:372 INF Done executing stage 'kairos-install pre.before'
2024-10-23T14:42:372 INF Running stage: kairos-install.pre
2024-10-23T14:42:372 INF Done executing stage 'kairos-install.pre'
2024-10-23T14:42:372 INF Running stage: kairos-install.pre.after
2024-10-23T14:42:372 INF Done executing stage 'kairos-install.pre.after'
2024-10-23T14:42:372 INF Partitioning device..
2024-10-23T14:42:372 INF Creating partition table for partition type gpt
2024-10-23T14:42:372 INF Created partition table for partition type got

panic: interface conversion: part. Partition is *mbr.Partition, not *gpt.Partition
goroutine 1 [running]:
github.com/kairos-io/kairos-agent/v2/pkg/elemental. (*Elemental).PartitionAndFormatDevice(0xc000989ab8, {0x4cbd778, 0xc00025cb483
/go/src/github.com/kairos-io/kairos-agent/pkg/elemental/elemental.go:113 +0x1094
github.com/kairos-io/kairos-agent/v2/pkg/action.InstallAction.Run({0xc00022C600, 0xc00025cb483)
/go/src/github.com/kairos-io/kairos-agent/pkg/action/install.go:172 +0x6fe
github.com/kairos-io/kairos-agent/v2/internal/agent.runInstall(0xc00022C600)
/go/src/github.com/kairos-io/kairos-agent/internal/agent/install.go:283 +0x212
github.com/kairos-io/kairos-agent/v2/internal/agent.RunInstall(0xc00022c600)
/go/src/github.com/kairos-io/kairos-agent/internal/agent/install.go:228
+0×231
github.com/kairos-io/kairos-agent/v2/internal/agent.InteractiveInstall(0x0, 0x0, {0x0, 0x03)
/go/src/github.com/kairos-io/kairos-agent/internal/agent/interactive_install.go:281
1 +0x1418
main.init.func18(0xc0002f5340)
/go/src/github.com/kairos-io/kairos-agent/main.go:495 +0x75
github.com/urfave/cli/v2. (*Command) .Run(0x56976a0,
0xc0002f5340, {0xc0002eec90, 0x1, 0x13)
/go/pkg/mod/github.com/urfave/cli/v2@v2.27.4/command.go:276 +0x7e2
github.com/urfave/cli/v2. (*Command).Run(0xc0001df1e0, 0xc0002f5180, {0xc00003e040, 0x2, 0x23)
/go/pkg/mod/github.com/urfave/Cli/v2@v2.27.4/command.go:269 +0xa65
github.com/urfave/cli/v2.(*App).RunContext(0xc0002e7400, {0x4cbd548, 0x56e8a403, {0xc00003e040, 0x2, 0x23)
/go/pkg/mod/github.com/urfave/cli/v2@v2.27.4/app.go:333 +0x5a5
github.com/urfave/cli/v2. (*App).Run(...)
/go/pkg/mod/github.com/urfave/cli/v2@v2.27.4/app.go:307
main.main ()
/go/src/github.com/kairos-io/kairos-agent/main.go:859

Here's evidence of the misaligned partitions (OCRd from console screenshot)

$ sudo fdisk -1 /dev/sda
GPT PMBR size mismatch (3205234687 |= 937525247)
will be corrected by write.
Disk /dev/sda: 3.49 TiB, 3840103415808 bytes, 937525248
sectors
Disk model: RAID
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/0 size (minimum/optimal): 16777216 bytes / 16777216 bytes
Disklabel type: dos
Disk identifier: 0x00000000
Device Boot Start End Sectors Size Id Type
/dev/sda1 1 937525247 937525247 3.5T ee GPT

Partition 1 does not start on physical sector boundary.

image

Itxaka commented 1 day ago

good catch!

Itxaka commented 1 day ago

from diskfs

// when we use a disk image with a GPT, we cannot get the logical sector size from the disk via the kernel
//
//  so we use the default sector size of 512, per Rod Smith
Itxaka commented 1 day ago

ahhhh this is a RAID, interesting!

Last thing I know was that most of (some?) nvme drives have a 512 sector size compatibility layer that they show to the OS, even if underlying the size is 4096.

I wonder why a RAID has that by default :D

Itxaka commented 1 day ago

patch sent and tested.

To test, run qemu with a custom nvme disk with 4096 sector size:

$ qemu-img create /tmp/disk.img +50G
$ qemu-system-x86_64 -cpu host -smp 4 -m 8192M -name test -serial file:/tmp/serial.out -device e1000,netdev=user.0 -machine type=pc,accel=kvm -netdev user,id=user.0,hostfwd=tcp::2222-:22 -cdrom kairos-ubuntu-24.04-standard-amd64-generic-v3.2.1-23-g409dc0d.iso -drive file=/tmp/disk.img,format=raw,if=none,id=NVME1 -device nvme,drive=NVME1,serial=nvme-1,physical_block_size=4096,logical_block_size=4096

The try with the patched version and see that it generates the proper sizes:

Disk /dev/nvme0n1: 50 GiB, 53687091200 bytes, 13107200 sectors
Disk model: QEMU NVMe Ctrl                          
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 2CEFF841-4C1D-5A14-8E46-ED3AEB621093

Device           Start      End  Sectors  Size Type
/dev/nvme0n1p1     256      511      256    1M BIOS boot
/dev/nvme0n1p2     512    16895    16384   64M Linux filesystem
/dev/nvme0n1p3   16896  1011711   994816  3.8G Linux filesystem
/dev/nvme0n1p4 1011712  2683135  1671424  6.4G Linux filesystem
/dev/nvme0n1p5 2683136 13106943 10423808 39.8G Linux filesystem
richardelling commented 1 day ago

Last thing I know was that most of (some?) nvme drives have a 512 sector size compatibility layer that they show to the OS, even if underlying the size is 4096.

Some NVMe drives are byte-addressable (sector size = 1). Ok technically, an NVMe drive can have a different sector size for each namespace. But in any case, assuming these things is bad and not sustainable over time. For example, SSD vendors are pushing for minimum block size to be 16KiB for their more dense products. So it is best to move away from counting sectors and towards either a reasonably appropriate minimum size or percentage. In this case, the RAID vendor is lying about the minimum I/O size and probably other things, too... quite common, unfortunately. Lots of mines in this minefield.

As for this exact problem, elemental.PartitionAndFormatDevice shouldn't assume partition.Read only returns a gptTable because it could return an mbrTable You can probably work around it by putting a GPT label on the disk.

Itxaka commented 19 hours ago

Last thing I know was that most of (some?) nvme drives have a 512 sector size compatibility layer that they show to the OS, even if underlying the size is 4096.

Some NVMe drives are byte-addressable (sector size = 1). Ok technically, an NVMe drive can have a different sector size for each namespace. But in any case, assuming these things is bad and not sustainable over time. For example, SSD vendors are pushing for minimum block size to be 16KiB for their more dense products. So it is best to move away from counting sectors and towards either a reasonably appropriate minimum size or percentage. In this case, the RAID vendor is lying about the minimum I/O size and probably other things, too... quite common, unfortunately. Lots of mines in this minefield.

Interesting!

As for this exact problem, elemental.PartitionAndFormatDevice shouldn't assume partition.Read only returns a gptTable because it could return an mbrTable You can probably work around it by putting a GPT label on the disk.

mmmh, we only support GPT partition tables in Kairos, if the disk has a MBR then it should fail with a clear error message saying that we dont support it.

I think the error that diskfs returned was mainly because we screwed with the sector size and it returned a wrong or nil object. We hardcoded the sector size, instead of opening the disk and letting it tell us the actual sector size. So there has to be a missing error check either on our side or on diskfs that makes the return value not raise an error.

In any case, the patch attached should alleviate this issue :D