Closed rgl closed 8 months ago
There are not steps to reproduce.
I've added them now. Please let me know if more information is required.
Looking into your repo, it seems to be vanilla Talos + qemu-guest-agent extension.
I don't see any problem booting with:
sudo -E talosctl cluster create --provisioner=qemu --cidr=172.20.0.0/24 --memory 3072 --memory-workers 8192 --cpus 4 --cpus-workers 4 --controlplanes 1 --workers 0 --talos-version=v1.7 --disk-image-path=https://factory.talos.dev/image/ce4c980550dd2ab1b17bbf2b08801c7eb59418eafe8f279833297925d67c7515/v1.7.0-alpha.1/metal-amd64.raw.xz --skip-injecting-config
The schematic:
Your image schematic ID is: ce4c980550dd2ab1b17bbf2b08801c7eb59418eafe8f279833297925d67c7515
customization:
systemExtensions:
officialExtensions:
- siderolabs/qemu-guest-agent
So whatever is your problem, it's something else, but not Talos?
I'm not using the online image factory service. I'm using the ghcr.io/siderolabs/imager:v1.7.0-alpha.1 image locally.
Here's exactly how I'm creating the image locally, then start it in qemu (with uefi firmware):
cat >talos.yml <<'EOF'
arch: amd64
platform: nocloud
secureboot: false
version: v1.7.0-alpha.1
customization:
extraKernelArgs:
- net.ifnames=0
input:
kernel:
path: /usr/install/amd64/vmlinuz
initramfs:
path: /usr/install/amd64/initramfs.xz
baseInstaller:
imageRef: ghcr.io/siderolabs/installer:v1.7.0-alpha.1
systemExtensions:
- imageRef: ghcr.io/siderolabs/qemu-guest-agent:8.2.2
output:
kind: image
imageOptions:
diskSize: 2147483648
diskFormat: raw
outFormat: raw
EOF
mkdir -p tmp/talos
docker run --rm -i \
-v $PWD/tmp/talos:/secureboot:ro \
-v $PWD/tmp/talos:/out \
-v /dev:/dev \
--privileged \
ghcr.io/siderolabs/imager:v1.7.0-alpha.1 \
- <talos.yml
qemu-img convert -O qcow2 tmp/talos/nocloud-amd64.raw tmp/talos/nocloud-amd64.qcow2
qemu-img info tmp/talos/nocloud-amd64.raw
qemu-img info tmp/talos/nocloud-amd64.qcow2
Here's the imager output:
assembling the finalized profile...
skipped pulling overlay (no overlay)
profile ready:
arch: amd64
platform: nocloud
secureboot: false
version: v1.7.0-alpha.1
customization:
extraKernelArgs:
- net.ifnames=0
input:
kernel:
path: /usr/install/amd64/vmlinuz
initramfs:
path: /usr/install/amd64/initramfs.xz
baseInstaller:
imageRef: ghcr.io/siderolabs/installer:v1.7.0-alpha.1
systemExtensions:
- imageRef: ghcr.io/siderolabs/qemu-guest-agent:8.2.2
output:
kind: image
imageOptions:
diskSize: 2147483648
diskFormat: raw
outFormat: raw
rebuilding initramfs with system extensions...
copying /usr/install/amd64/initramfs.xz to /tmp/imager835395496/initramfs.xz
rebuilding initramfs with system extensions...
pulling ghcr.io/siderolabs/qemu-guest-agent:8.2.2...
rebuilding initramfs with system extensions...
discovered system extensions:
rebuilding initramfs with system extensions...
NAME VERSION AUTHOR
rebuilding initramfs with system extensions...
qemu-guest-agent 8.2.2 Markus Reiter
rebuilding initramfs with system extensions...
validating system extensions
rebuilding initramfs with system extensions...
compressing system extensions
rebuilding initramfs with system extensions...
creating system extensions initramfs archive and compressing it
initramfs ready
kernel command line: talos.platform=nocloud console=tty1 console=ttyS0 net.ifnames=0 net.ifnames=0 init_on_alloc=1 slab_nomerge pti=on consoleblank=0 nvme_core.io_timeout=4294967295 printk.devkmsg=on ima_template=ima-ng ima_appraise=fix ima_hash=sha512
creating disk image...
creating raw disk of size 2.1 GB
creating disk image...
attaching loopback device
creating disk image...
creating new partition table on /dev/loop24
creating disk image...
logical/physical block size: 512/512
creating disk image...
minimum/optimal I/O size: 512/512
creating disk image...
partitioning /dev/loop24 - EFI "105 MB"
creating disk image...
created /dev/loop24p1 (EFI) size 204800 blocks
creating disk image...
partitioning /dev/loop24 - BIOS "1.0 MB"
creating disk image...
created /dev/loop24p2 (BIOS) size 2048 blocks
creating disk image...
partitioning /dev/loop24 - BOOT "1.0 GB"
creating disk image...
created /dev/loop24p3 (BOOT) size 2048000 blocks
creating disk image...
partitioning /dev/loop24 - META "1.0 MB"
creating disk image...
created /dev/loop24p4 (META) size 2048 blocks
creating disk image...
partitioning /dev/loop24 - STATE "105 MB"
creating disk image...
created /dev/loop24p5 (STATE) size 204800 blocks
creating disk image...
partitioning /dev/loop24 - EPHEMERAL "0 B"
creating disk image...
created /dev/loop24p6 (EPHEMERAL) size 1728512 blocks
creating disk image...
formatting the partition "/dev/loop24p1" as "vfat" with label "EFI"
creating disk image...
zeroing out "/dev/loop24p2"
creating disk image...
formatting the partition "/dev/loop24p3" as "xfs" with label "BOOT"
creating disk image...
zeroing out "/dev/loop24p4"
creating disk image...
zeroing out "/dev/loop24p5"
creating disk image...
zeroing out "/dev/loop24p6"
creating disk image...
copying /usr/install/amd64/vmlinuz to /tmp/imager835395496/image/boot/A/vmlinuz
creating disk image...
copying /tmp/imager835395496/initramfs.xz to /tmp/imager835395496/image/boot/A/initramfs.xz
creating disk image...
writing /tmp/imager835395496/image/boot/grub/grub.cfg to disk
creating disk image...
executing: grub-install --boot-directory=/tmp/imager835395496/image/boot --efi-directory=/tmp/imager835395496/image/boot/EFI --removable --no-nvram --target=x86_64-efi /dev/loop24
creating disk image...
executing: grub-install --boot-directory=/tmp/imager835395496/image/boot --efi-directory=/tmp/imager835395496/image/boot/EFI --removable --no-nvram --target=i386-pc /dev/loop24
creating disk image...
detaching loopback device
disk image ready
output asset path: /out/nocloud-amd64.raw
Then, use it in qemu in UEFI mode:
cp /usr/share/OVMF/OVMF_CODE.fd ovmf-code.fd
cp /usr/share/OVMF/OVMF_VARS.fd ovmf-vars.fd
cp tmp/talos/nocloud-amd64.qcow2 talos.qcow2
qemu-system-x86_64 \
-name talos \
-machine q35,accel=kvm,smm=on \
-cpu host \
-m 4G \
-smp cores=4 \
-k pt \
-global driver=cfi.pflash01,property=secure,value=on \
-drive if=pflash,unit=0,file=ovmf-code.fd,format=raw,readonly=on \
-drive if=pflash,unit=1,file=ovmf-vars.fd,format=raw \
-drive if=none,file=talos.qcow2,format=qcow2,media=disk,discard=unmap,cache=unsafe,id=hd0 \
-device virtio-scsi-pci,id=scsi0 \
-device scsi-hd,drive=hd0
I have no problem booting following your exact steps from the above.
Interesting. With VMs, this never happened to me before. So my host machine and/or os has something different than yours.
My host machine is Ubuntu 22.04.4 LTS with:
$ dpkg -l | grep -P 'qemu|ovmf'
ii ipxe-qemu 1.21.1+git-20220113.fbbdc3926-0ubuntu1 all PXE boot firmware - ROM images for qemu
ii ipxe-qemu-256k-compat-efi-roms 1.0.0+git-20150424.a25a16d-0ubuntu4 all PXE boot firmware - Compat EFI ROM images for qemu
ii libvirt-daemon-driver-qemu 8.0.0-1ubuntu7.8 amd64 Virtualization daemon QEMU connection driver
ii ovmf 2022.02-3ubuntu0.22.04.2 all UEFI firmware for 64-bit x86 virtual machines
ii qemu-block-extra 1:6.2+dfsg-2ubuntu6.17 amd64 extra block backend modules for qemu-system and qemu-utils
ii qemu-system-common 1:6.2+dfsg-2ubuntu6.17 amd64 QEMU full system emulation binaries (common files)
ii qemu-system-data 1:6.2+dfsg-2ubuntu6.17 all QEMU full system emulation (data files)
ii qemu-system-gui 1:6.2+dfsg-2ubuntu6.17 amd64 QEMU full system emulation binaries (user interface and audio support)
ii qemu-system-x86 1:6.2+dfsg-2ubuntu6.17 amd64 QEMU full system emulation binaries (x86)
ii qemu-utils 1:6.2+dfsg-2ubuntu6.17 amd64 QEMU utilities
FWIW, this is also happening in a Proxmox host:
My side, but it doesn't seem like a Talos issue to me so far (?).
ii ipxe-qemu 1.21.1+git-20220113.fbbdc3926-0ubuntu1 all PXE boot firmware - ROM images for qemu
ii ipxe-qemu-256k-compat-efi-roms 1.0.0+git-20150424.a25a16d-0ubuntu4 all PXE boot firmware - Compat EFI ROM images for qemu
ii libvirt-daemon-driver-qemu 9.6.0-1ubuntu1 amd64 Virtualization daemon QEMU connection driver
ii ovmf 2023.05-2ubuntu0.1 all UEFI firmware for 64-bit x86 virtual machines
ii qemu-block-extra 1:8.0.4+dfsg-1ubuntu3.23.10.3 amd64 extra block backend modules for qemu-system and qemu-utils
ii qemu-efi-aarch64 2023.05-2ubuntu0.1 all UEFI firmware for 64-bit ARM virtual machines
ii qemu-efi-arm 2023.05-2ubuntu0.1 all UEFI firmware for 32-bit ARM virtual machines
rc qemu-kvm 1:5.0-5ubuntu9.7 amd64 QEMU Full virtualization on x86 hardware
ii qemu-system-arm 1:8.0.4+dfsg-1ubuntu3.23.10.3 amd64 QEMU full system emulation binaries (arm)
ii qemu-system-common 1:8.0.4+dfsg-1ubuntu3.23.10.3 amd64 QEMU full system emulation binaries (common files)
ii qemu-system-data 1:8.0.4+dfsg-1ubuntu3.23.10.3 all QEMU full system emulation (data files)
ii qemu-system-gui 1:8.0.4+dfsg-1ubuntu3.23.10.3 amd64 QEMU full system emulation binaries (user interface and audio support)
ii qemu-system-x86 1:8.0.4+dfsg-1ubuntu3.23.10.3 amd64 QEMU full system emulation binaries (x86)
ii qemu-user-static 1:8.0.4+dfsg-1ubuntu3.23.10.3 amd64 QEMU user mode emulation binaries (static version)
ii qemu-utils 1:8.0.4+dfsg-1ubuntu3.23.10.3 amd64 QEMU utilities
In the current form, due to the referenced grub update in talos 1.7, it does not work in a Ubuntu 22.04 host nor in a Proxmox 8.1 host.
There's GRUB 2.12 released, we'll update. You can in the meantime use Image Factory - it will have older GRUB there
We're updating GRUB for the next 1.7 release https://github.com/siderolabs/pkgs/pull/919
Thanks for reporting!
Bug Report
Description
NB This is the same problem as https://github.com/siderolabs/talos/issues/8023 that happened in talos 1.6.
While trying to upgrade to talos 1.7.0-alpha.1 in qem/kvm VM at https://github.com/rgl/terraform-libvirt-talos/tree/upgrade-to-talos-1.7, talos cannot boot in qemu/kvm due to:
digging a little bit at the grub prompt with
ls (hd0,gpt3)/grub/x86_64-efi/
, it seems there's a bug somewhere trying to find files with theerror: invalid XFS directory entry
error:it seems this kind of bug is being reported elsewhere, I found these might help understand/fix the problem:
Reproduce
./do init
to create the local image attmp/talos/talos-1.7.0-alpha.1.qcow2
and metadata intmp/talos/talos-1.7.0-alpha.1.yml
../do plan-apply
to launch everything (you need to follow the README.md for more details).Environment