siderolabs / talos

Talos Linux is a modern Linux distribution built for Kubernetes.
https://www.talos.dev
Mozilla Public License 2.0
6.91k stars 556 forks source link

Install Failure: device or resource busy #9701

Closed askedrelic closed 1 week ago

askedrelic commented 1 week ago

Bug Report

Installing Talos 1.8.2 from maintenance mode fails with a "Error: failed to create partitions: failed to write GPT: failed to add partition 1: device or resource busy" message. The failed install leaves the disk without a bootable partition.

talosctl -e XXX -n XXX apply-config -i --file controlplane.yml

Description

Trying to install Talos via API on a bare-metal machine, in a 2 step process. First step, I'm installing a fresh Ubuntu 22.04 instance and build/write the metal-amd64.iso Talos installer over the existing Ubuntu disk, then rebooting into Talos maintenance mode. For the 2nd step, I apply-config to install Talos with my config. Related to https://github.com/siderolabs/talos/issues/9647 ; similar bootstrap process but different local machine I have physical access to; Dell PowerEdge R740

I believe this related to Talos 1.8. I've tested the same process config with 1.7 and it installs successfully. I'm going to keep the machine running 1.7.7 for now, if you need more info.

Logs

Disk view from maintenance mode

talosctl -n 192.168.2.44 disks -i
DEV          MODEL            SERIAL   TYPE      UUID   WWID                                   MODALIAS      NAME   SIZE     BUS_PATH                                                          SUBSYSTEM          READ_ONLY   SYSTEM_DISK
/dev/loop0   -                -        UNKNOWN   -      -                                      -             -      139 kB   /virtual                                                          /sys/class/block   *
/dev/loop1   -                -        UNKNOWN   -      -                                      -             -      4.1 kB   /virtual                                                          /sys/class/block   *
/dev/loop2   -                -        UNKNOWN   -      -                                      -             -      6.1 MB   /virtual                                                          /sys/class/block   *
/dev/loop3   -                -        UNKNOWN   -      -                                      -             -      179 MB   /virtual                                                          /sys/class/block   *
/dev/loop4   -                -        UNKNOWN   -      -                                      -             -      75 MB    /virtual                                                          /sys/class/block   *
/dev/sda     PERC H730P Adp   -        HDD       -      naa.61866da0806a24002df3524d1acd06d2   scsi:t-0x00   -      2.0 TB   /pci0000:17/0000:17:00.0/0000:18:00.0/host0/target0:2:0/0:2:0:0   /sys/class/block

talosctl -n 192.168.2.44 get blockdevices -i
NODE   NAMESPACE   TYPE          ID      VERSION   TYPE        PARTITIONNAME        GENERATION
       runtime     BlockDevice   loop0   2         disk                             2
       runtime     BlockDevice   loop1   2         disk                             2
       runtime     BlockDevice   loop2   2         disk                             2
       runtime     BlockDevice   loop3   2         disk                             2
       runtime     BlockDevice   loop4   2         disk                             2
       runtime     BlockDevice   loop5   2         disk                             2
       runtime     BlockDevice   loop6   2         disk                             2
       runtime     BlockDevice   loop7   2         disk                             2
       runtime     BlockDevice   sda     1         disk                             1
       runtime     BlockDevice   sda1    1         partition   Gap0                 1
       runtime     BlockDevice   sda2    1         partition   EFI boot partition   1
       runtime     BlockDevice   sda3    1         partition   HFSPLUS              1
       runtime     BlockDevice   sda4    1         partition   Gap1                 1

failed install section

Nov 11 16:12:48   {"address":"unix:///run/containerd/s/534063ea0841798cfadf1a07f7aa7339117054f5dfc102bad3d3722e61875a5f","msg":"connecting to shim upgrade","namespace":"system","protocol":"ttrpc","talos-level":"info","talos-service":"containerd","talos-time":"2024-11-11T21:12:48.715631627Z","version":3}
Nov 11 16:12:48   {"clock":530500073,"facility":"user","msg":"2024/11/11 21:12:48 running Talos installer v1.8.2\n","priority":"warning","seq":1892,"talos-level":"warn","talos-time":"2024-11-11T21:12:47.378392564Z"}
Nov 11 16:12:48   {"msg":"2024/11/11 21:12:48.830973 machined/authz/authorizer authorized ([os:admin os:operator os:reader] includes [os:admin])","talos-level":"info","talos-service":"machined","talos-time":"2024-11-11T21:12:48.831012558Z"}
Nov 11 16:12:48   {"msg":"2024/11/11 21:12:48.830973 machined/authz/authorizer authorized ([os:admin os:operator os:reader] includes [os:admin])","talos-level":"info","talos-service":"machined","talos-time":"2024-11-11T21:12:48.831075116Z"}
Nov 11 16:12:48   {"msg":"2024/11/11 21:12:48.831026 machined OK [/machine.MachineService/Version] 72.719µs unary Success (:authority=localhost;content-type=application/grpc;grpc-accept-encoding=gzip;runtime=Talos;talos-role=os:admin;user-agent=grpc-go/1.66.3)","talos-level":"info","talos-service":"machined","talos-time":"2024-11-11T21:12:48.831112874Z"}
Nov 11 16:12:48   {"msg":"2024/11/11 21:12:48.831026 machined OK [/machine.MachineService/Version] 72.719µs unary Success (:authority=localhost;content-type=application/grpc;grpc-accept-encoding=gzip;runtime=Talos;talos-role=os:admin;user-agent=grpc-go/1.66.3)","talos-level":"info","talos-service":"machined","talos-time":"2024-11-11T21:12:48.831200699Z"}
Nov 11 16:12:48   {"clock":530516292,"facility":"user","msg":"2024/11/11 21:12:48 created EFI (C12A7328-F81F-11D2-BA4B-00A0C93EC93B) size 104857600 bytes\n","priority":"warning","seq":1893,"talos-level":"warn","talos-time":"2024-11-11T21:12:47.394611564Z"}
Nov 11 16:12:48   {"action":"remove","component":"controller-runtime","controller":"block.DevicesController","id":"sda1","msg":"2024-11-11T21:12:48.838Z \u001b[35mDEBUG\u001b[0m processing event","path":"/sys/devices/pci0000:17/0000:17:00.0/0000:18:00.0/host0/target0:2:0/0:2:0:0/block/sda/sda1","talos-level":"info","talos-service":"controller-runtime","talos-time":"2024-11-11T21:12:48.839030285Z"}
Nov 11 16:12:48   {"component":"controller-runtime","controller":"block.DevicesController","id":"sda","msg":"2024-11-11T21:12:48.839Z \u001b[35mDEBUG\u001b[0m bumping generation for device, inotify update","talos-level":"info","talos-service":"controller-runtime","talos-time":"2024-11-11T21:12:48.839275762Z"}
Nov 11 16:12:48   {"clock":530516312,"facility":"user","msg":"2024/11/11 21:12:48 created BIOS (21686148-6449-6E6F-744E-656564454649) size 1048576 bytes\n","priority":"warning","seq":1894,"talos-level":"warn","talos-time":"2024-11-11T21:12:47.394631564Z"}
Nov 11 16:12:48   {"clock":530516326,"facility":"user","msg":"2024/11/11 21:12:48 created BOOT (0FC63DAF-8483-4772-8E79-3D69D8477DE4) size 1048576000 bytes\n","priority":"warning","seq":1895,"talos-level":"warn","talos-time":"2024-11-11T21:12:47.394645564Z"}
Nov 11 16:12:48   {"clock":530516339,"facility":"user","msg":"2024/11/11 21:12:48 created META (0FC63DAF-8483-4772-8E79-3D69D8477DE4) size 1048576 bytes\n","priority":"warning","seq":1896,"talos-level":"warn","talos-time":"2024-11-11T21:12:47.394658564Z"}
Nov 11 16:12:48   {"clock":530518161,"facility":"user","msg":"Error: failed to create partitions: failed to write GPT: failed to add partition 1: device or resource busy\n","priority":"warning","seq":1897,"talos-level":"warn","talos-time":"2024-11-11T21:12:47.396480564Z"}
Nov 11 16:12:48   {"clock":530519305,"facility":"user","msg":"Usage:\n","priority":"warning","seq":1898,"talos-level":"warn","talos-time":"2024-11-11T21:12:47.397624564Z"}
Nov 11 16:12:48   {"clock":530519314,"facility":"user","msg":"  installer install [flags]\n","priority":"warning","seq":1899,"talos-level":"warn","talos-time":"2024-11-11T21:12:47.397633564Z"}
Nov 11 16:12:48   {"clock":530519320,"facility":"user","msg":"\n","priority":"warning","seq":1900,"talos-level":"warn","talos-time":"2024-11-11T21:12:47.397639564Z"}
Nov 11 16:12:48   {"clock":530519325,"facility":"user","msg":"Flags:\n","priority":"warning","seq":1901,"talos-level":"warn","talos-time":"2024-11-11T21:12:47.397644564Z"}
Nov 11 16:12:48   {"clock":530519331,"facility":"user","msg":"  -h, --help   help for install\n","priority":"warning","seq":1902,"talos-level":"warn","talos-time":"2024-11-11T21:12:47.397650564Z"}
Nov 11 16:12:48   {"clock":530519337,"facility":"user","msg":"\n","priority":"warning","seq":1903,"talos-level":"warn","talos-time":"2024-11-11T21:12:47.397656564Z"}
Nov 11 16:12:48   {"clock":530519342,"facility":"user","msg":"Global Flags:\n","priority":"warning","seq":1904,"talos-level":"warn","talos-time":"2024-11-11T21:12:47.397661564Z"}
Nov 11 16:12:48   {"clock":530519349,"facility":"user","msg":"      --arch string                    The target architecture (default \"amd64\")\n","priority":"warning","seq":1905,"talos-level":"warn","talos-time":"2024-11-11T21:12:47.397668564Z"}
Nov 11 16:12:48   {"clock":530519355,"facility":"user","msg":"      --board string                   Deprecated: no op (default \"none\")\n","priority":"warning","seq":1906,"talos-level":"warn","talos-time":"2024-11-11T21:12:47.397674564Z"}
Nov 11 16:12:48   {"clock":530519361,"facility":"user","msg":"      --bootloader                     Deprecated: no op (default true)\n","priority":"warning","seq":1907,"talos-level":"warn","talos-time":"2024-11-11T21:12:47.397680564Z"}
Nov 11 16:12:48   {"clock":530519366,"facility":"user","msg":"      --config string                  The value of talos.config\n","priority":"warning","seq":1908,"talos-level":"warn","talos-time":"2024-11-11T21:12:47.397685564Z"}
Nov 11 16:12:48   {"clock":530519372,"facility":"user","msg":"      --disk string                    The path to the disk to install to\n","priority":"warning","seq":1909,"talos-level":"warn","talos-time":"2024-11-11T21:12:47.397691564Z"}
Nov 11 16:12:48   {"clock":530519378,"facility":"user","msg":"      --extra-kernel-arg stringArray   Extra argument to pass to the kernel\n","priority":"warning","seq":1910,"talos-level":"warn","talos-time":"2024-11-11T21:12:47.397697564Z"}
Nov 11 16:12:48   {"clock":530519385,"facility":"user","msg":"      --force                          Indicates that the install should forcefully format the partition\n","priority":"warning","seq":1911,"talos-level":"warn","talos-time":"2024-11-11T21:12:47.397704564Z"}
Nov 11 16:12:48   {"clock":530519391,"facility":"user","msg":"      --meta metaValueSlice            A key/value pair for META (default [])\n","priority":"warning","seq":1912,"talos-level":"warn","talos-time":"2024-11-11T21:12:47.397710564Z"}
Nov 11 16:12:48   {"clock":530519396,"facility":"user","msg":"      --platform string                The value of talos.platform\n","priority":"warning","seq":1913,"talos-level":"warn","talos-time":"2024-11-11T21:12:47.397715564Z"}
Nov 11 16:12:48   {"clock":530519401,"facility":"user","msg":"      --upgrade                        Indicates that the install is being performed by an upgrade\n","priority":"warning","seq":1914,"talos-level":"warn","talos-time":"2024-11-11T21:12:47.397720564Z"}
Nov 11 16:12:48   {"action":"remove","component":"controller-runtime","controller":"block.DevicesController","id":"sda2","msg":"2024-11-11T21:12:48.846Z \u001b[35mDEBUG\u001b[0m processing event","path":"/sys/devices/pci0000:17/0000:17:00.0/0000:18:00.0/host0/target0:2:0/0:2:0:0/block/sda/sda2","talos-level":"info","talos-service":"controller-runtime","talos-time":"2024-11-11T21:12:48.846469726Z"}
Nov 11 16:12:48   {"clock":530519408,"facility":"user","msg":"      --zero                           Indicates that the install should write zeros to the disk before installing\n","priority":"warning","seq":1915,"talos-level":"warn","talos-time":"2024-11-11T21:12:47.397727564Z"}
Nov 11 16:12:48   {"clock":530519413,"facility":"user","msg":"\n","priority":"warning","seq":1916,"talos-level":"warn","talos-time":"2024-11-11T21:12:47.397732564Z"}
Nov 11 16:12:48   {"clock":530519418,"facility":"user","msg":"failed to create partitions: failed to write GPT: failed to add partition 1: device or resource busy\n","priority":"warning","seq":1917,"talos-level":"warn","talos-time":"2024-11-11T21:12:47.397737564Z"}
Nov 11 16:12:48   {"action":"remove","component":"controller-runtime","controller":"block.DevicesController","id":"sda3","msg":"2024-11-11T21:12:48.846Z \u001b[35mDEBUG\u001b[0m processing event","path":"/sys/devices/pci0000:17/0000:17:00.0/0000:18:00.0/host0/target0:2:0/0:2:0:0/block/sda/sda3","talos-level":"info","talos-service":"controller-runtime","talos-time":"2024-11-11T21:12:48.846636868Z"}
Nov 11 16:12:48   {"action":"remove","component":"controller-runtime","controller":"block.DevicesController","id":"sda4","msg":"2024-11-11T21:12:48.846Z \u001b[35mDEBUG\u001b[0m processing event","path":"/sys/devices/pci0000:17/0000:17:00.0/0000:18:00.0/host0/target0:2:0/0:2:0:0/block/sda/sda4","talos-level":"info","talos-service":"controller-runtime","talos-time":"2024-11-11T21:12:48.846865747Z"}
Nov 11 16:12:48   {"action":"change","component":"controller-runtime","controller":"block.DevicesController","id":"sda","msg":"2024-11-11T21:12:48.847Z \u001b[35mDEBUG\u001b[0m processing event","path":"/sys/devices/pci0000:17/0000:17:00.0/0000:18:00.0/host0/target0:2:0/0:2:0:0/block/sda","talos-level":"info","talos-service":"controller-runtime","talos-time":"2024-11-11T21:12:48.847364506Z"}
Nov 11 16:12:48   {"clock":530526182,"facility":"kern","msg":" sda: sda1 sda2 sda3 sda4\n","priority":"info","seq":1918,"talos-level":"info","talos-time":"2024-11-11T21:12:47.404501564Z"}
Nov 11 16:12:48   {"action":"add","component":"controller-runtime","controller":"block.DevicesController","id":"sda1","msg":"2024-11-11T21:12:48.847Z \u001b[35mDEBUG\u001b[0m processing event","path":"/sys/devices/pci0000:17/0000:17:00.0/0000:18:00.0/host0/target0:2:0/0:2:0:0/block/sda/sda1","talos-level":"info","talos-service":"controller-runtime","talos-time":"2024-11-11T21:12:48.847633731Z"}
Nov 11 16:12:48   {"action":"add","component":"controller-runtime","controller":"block.DevicesController","id":"sda2","msg":"2024-11-11T21:12:48.847Z \u001b[35mDEBUG\u001b[0m processing event","path":"/sys/devices/pci0000:17/0000:17:00.0/0000:18:00.0/host0/target0:2:0/0:2:0:0/block/sda/sda2","talos-level":"info","talos-service":"controller-runtime","talos-time":"2024-11-11T21:12:48.847823435Z"}
Nov 11 16:12:48   {"action":"add","component":"controller-runtime","controller":"block.DevicesController","id":"sda3","msg":"2024-11-11T21:12:48.847Z \u001b[35mDEBUG\u001b[0m processing event","path":"/sys/devices/pci0000:17/0000:17:00.0/0000:18:00.0/host0/target0:2:0/0:2:0:0/block/sda/sda3","talos-level":"info","talos-service":"controller-runtime","talos-time":"2024-11-11T21:12:48.847969972Z"}
Nov 11 16:12:48   {"action":"add","component":"controller-runtime","controller":"block.DevicesController","id":"sda4","msg":"2024-11-11T21:12:48.848Z \u001b[35mDEBUG\u001b[0m processing event","path":"/sys/devices/pci0000:17/0000:17:00.0/0000:18:00.0/host0/target0:2:0/0:2:0:0/block/sda/sda4","talos-level":"info","talos-service":"controller-runtime","talos-time":"2024-11-11T21:12:48.848292272Z"}
Nov 11 16:12:48   {"component":"controller-runtime","controller":"block.DiscoveryController","devices":["sda"],"msg":"2024-11-11T21:12:48.866Z \u001b[35mDEBUG\u001b[0m rescanning devices","talos-level":"info","talos-service":"controller-runtime","talos-time":"2024-11-11T21:12:48.867076992Z"}
Nov 11 16:12:48   {"component":"controller-runtime","controller":"block.DiscoveryController","device":"sda","msg":"2024-11-11T21:12:48.868Z \u001b[35mDEBUG\u001b[0m magic matched, but probe returned no result","name":"gpt","offset":105906176,"talos-level":"info","talos-service":"controller-runtime","talos-time":"2024-11-11T21:12:48.868462723Z"}
Nov 11 16:12:48   {"component":"controller-runtime","controller":"block.DiscoveryController","device":"sda","msg":"2024-11-11T21:12:48.868Z \u001b[35mDEBUG\u001b[0m magic matched, but probe returned no result","name":"zfs","offset":105906176,"talos-level":"info","talos-service":"controller-runtime","talos-time":"2024-11-11T21:12:48.86866795Z"}
Nov 11 16:12:48   {"component":"controller-runtime","controller":"block.DiscoveryController","device":"sda","msg":"2024-11-11T21:12:48.868Z \u001b[35mDEBUG\u001b[0m magic matched, but probe returned no result","name":"gpt","offset":106954752,"talos-level":"info","talos-service":"controller-runtime","talos-time":"2024-11-11T21:12:48.868964679Z"}
Nov 11 16:12:48   {"component":"controller-runtime","controller":"block.DiscoveryController","device":"sda","msg":"2024-11-11T21:12:48.871Z \u001b[35mDEBUG\u001b[0m magic matched, but probe returned no result","name":"zfs","offset":106954752,"talos-level":"info","talos-service":"controller-runtime","talos-time":"2024-11-11T21:12:48.871467215Z"}
Nov 11 16:12:48   {"component":"controller-runtime","controller":"block.DiscoveryController","device":"sda","msg":"2024-11-11T21:12:48.872Z \u001b[35mDEBUG\u001b[0m magic matched, but probe returned no result","name":"gpt","offset":1155530752,"talos-level":"info","talos-service":"controller-runtime","talos-time":"2024-11-11T21:12:48.872238654Z"}
Nov 11 16:12:48   {"component":"controller-runtime","controller":"block.DiscoveryController","device":"sda","msg":"2024-11-11T21:12:48.872Z \u001b[35mDEBUG\u001b[0m magic matched, but probe returned no result","name":"zfs","offset":1155530752,"talos-level":"info","talos-service":"controller-runtime","talos-time":"2024-11-11T21:12:48.872377294Z"}
Nov 11 16:12:48   {"component":"controller-runtime","controller":"block.DiscoveryController","id":"sda","info":{"BlockDevice":{},"BlockSize":512,"DevNo":2048,"FilesystemBlockSize":0,"IOSize":512,"Label":null,"Name":"gpt","Parts":[{"BlockSize":512,"FilesystemBlockSize":4096,"Label":null,"Name":"vfat","PartitionIndex":1,"PartitionLabel":"EFI","PartitionOffset":1048576,"PartitionSize":104857600,"PartitionType":"c12a7328-f81f-11d2-ba4b-00a0c93ec93b","PartitionUUID":"91fa9b95-e767-4c9f-bbf2-5f76a2b883be","Parts":null,"ProbedSize":536868864,"UUID":null},{"BlockSize":0,"FilesystemBlockSize":0,"Label":null,"Name":"","PartitionIndex":2,"PartitionLabel":"BIOS","PartitionOffset":105906176,"PartitionSize":1048576,"PartitionType":"21686148-6449-6e6f-744e-656564454649","PartitionUUID":"8351161d-d21b-4c48-ad83-e919143eab2e","Parts":null,"ProbedSize":0,"UUID":null},{"BlockSize":0,"FilesystemBlockSize":0,"Label":null,"Name":"","PartitionIndex":3,"PartitionLabel":"BOOT","PartitionOffset":106954752,"PartitionSize":1048576000,"PartitionType":"0fc63daf-8483-4772-8e79-3d69d8477de4","PartitionUUID":"c69bf77c-7168-421d-bcda-f4989cfd4a37","Parts":null,"ProbedSize":0,"UUID":null},{"BlockSize":0,"FilesystemBlockSize":0,"Label":null,"Name":"","PartitionIndex":4,"PartitionLabel":"META","PartitionOffset":1155530752,"PartitionSize":1048576,"PartitionType":"0fc63daf-8483-4772-8e79-3d69d8477de4","PartitionUUID":"3d3d7be7-463a-4d71-ade2-5899bedc0c5d","Parts":null,"ProbedSize":0,"UUID":null}],"ProbedSize":2047087721984,"SectorSize":512,"Size":2047088787456,"UUID":"0cb39a04-fefe-4540-aa5e-3956e2910efc","WholeDisk":true},"msg":"2024-11-11T21:12:48.872Z \u001b[35mDEBUG\u001b[0m probed device","talos-level":"info","talos-service":"controller-runtime","talos-time":"2024-11-11T21:12:48.872475533Z"}
Nov 11 16:12:48   {"component":"controller-runtime","controller":"block.VolumeManagerController","disks":["/dev/sda"],"msg":"2024-11-11T21:12:48.873Z \u001b[35mDEBUG\u001b[0m matched disks","talos-level":"info","talos-service":"controller-runtime","talos-time":"2024-11-11T21:12:48.874050324Z","volume":"STATE"}
Nov 11 16:12:48   {"available":2045931159552,"component":"controller-runtime","controller":"block.VolumeManagerController","disk":"/dev/sda","msg":"2024-11-11T21:12:48.875Z \u001b[35mDEBUG\u001b[0m checking disk for provisioning","talos-level":"info","talos-service":"controller-runtime","talos-time":"2024-11-11T21:12:48.875898596Z","volume":"STATE"}
Nov 11 16:12:48   {"component":"controller-runtime","controller":"block.VolumeManagerController","disk":"/dev/sda","msg":"2024-11-11T21:12:48.875Z \u001b[35mDEBUG\u001b[0m picked disk","talos-level":"info","talos-service":"controller-runtime","talos-time":"2024-11-11T21:12:48.876010189Z","volume":"STATE"}
Nov 11 16:12:48   {"id":"upgrade","msg":"shim disconnected","namespace":"system","talos-level":"info","talos-service":"containerd","talos-time":"2024-11-11T21:12:48.87717563Z"}
Nov 11 16:12:48   {"id":"upgrade","msg":"cleaning up after shim disconnected","namespace":"system","talos-level":"warn","talos-service":"containerd","talos-time":"2024-11-11T21:12:48.877265815Z"}
Nov 11 16:12:48   {"msg":"cleaning up dead shim","namespace":"system","talos-level":"info","talos-service":"containerd","talos-time":"2024-11-11T21:12:48.877328197Z"}
Nov 11 16:12:48   {"msg":"2024/11/11 21:12:48.900420 [talos] task install (1/1): failed: task \"upgrade\" failed: exit code 1","talos-level":"info","talos-service":"machined","talos-time":"2024-11-11T21:12:48.900440273Z"}
Nov 11 16:12:48   {"msg":"2024/11/11 21:12:48.900420 [talos] task install (1/1): failed: task \"upgrade\" failed: exit code 1","talos-level":"info","talos-service":"machined","talos-time":"2024-11-11T21:12:48.900504099Z"}
Nov 11 16:12:48   {"msg":"2024/11/11 21:12:48.900479 [talos] phase install (2/12): failed","talos-level":"info","talos-service":"machined","talos-time":"2024-11-11T21:12:48.900543141Z"}
Nov 11 16:12:48   {"msg":"2024/11/11 21:12:48.900479 [talos] phase install (2/12): failed","talos-level":"info","talos-service":"machined","talos-time":"2024-11-11T21:12:48.900566307Z"}
Nov 11 16:12:48   {"msg":"2024/11/11 21:12:48.900506 [talos] install sequence: failed","talos-level":"info","talos-service":"machined","talos-time":"2024-11-11T21:12:48.900600854Z"}
Nov 11 16:12:48   {"msg":"2024/11/11 21:12:48.900506 [talos] install sequence: failed","talos-level":"info","talos-service":"machined","talos-time":"2024-11-11T21:12:48.900645362Z"}
Nov 11 16:12:48   {"msg":"2024/11/11 21:12:48.900661 [talos] service[ext-nvidia-persistenced](Failed): Condition failed: context canceled","talos-level":"info","talos-service":"machined","talos-time":"2024-11-11T21:12:48.900701705Z"}
Nov 11 16:12:48   {"msg":"2024/11/11 21:12:48.900661 [talos] service[ext-nvidia-persistenced](Failed): Condition failed: context canceled","talos-level":"info","talos-service":"machined","talos-time":"2024-11-11T21:12:48.900758271Z"}
Nov 11 16:12:48   {"msg":"2024/11/11 21:12:48.900744 [talos] service[dashboard](Stopping): Sending SIGTERM to Process([\"/sbin/dashboard\"])","talos-level":"info","talos-service":"machined","talos-time":"2024-11-11T21:12:48.900808347Z"}
Nov 11 16:12:48   {"msg":"2024/11/11 21:12:48.900744 [talos] service[dashboard](Stopping): Sending SIGTERM to Process([\"/sbin/dashboard\"])","talos-level":"info","talos-service":"machined","talos-time":"2024-11-11T21:12:48.900836459Z"}
Nov 11 16:12:48   {"msg":"2024/11/11 21:12:48.900831 [talos] service[udevd](Stopping): Sending SIGTERM to Process([\"/sbin/udevd\" \"--resolve-names=never\"])","talos-level":"info","talos-service":"machined","talos-time":"2024-11-11T21:12:48.900866836Z"}
Nov 11 16:12:48   {"msg":"2024/11/11 21:12:48.900831 [talos] service[udevd](Stopping): Sending SIGTERM to Process([\"/sbin/udevd\" \"--resolve-names=never\"])","talos-level":"info","talos-service":"machined","talos-time":"2024-11-11T21:12:48.900885661Z"}
Nov 11 16:12:48   {"msg":"2024/11/11 21:12:48.900984 [talos] service[containerd](Stopping): Sending SIGTERM to Process([\"/bin/containerd\" \"--address\" \"/system/run/containerd/containerd.sock\" \"--state\" \"/system/run/containerd\" \"--root\" \"/system/var/lib/containerd\"])","talos-level":"info","talos-service":"machined","talos-time":"2024-11-11T21:12:48.901021155Z"}
Nov 11 16:12:48   {"msg":"2024/11/11 21:12:48.900984 [talos] service[containerd](Stopping): Sending SIGTERM to Process([\"/bin/containerd\" \"--address\" \"/system/run/containerd/containerd.sock\" \"--state\" \"/system/run/containerd\" \"--root\" \"/system/var/lib/containerd\"])","talos-level":"info","talos-service":"machined","talos-time":"2024-11-11T21:12:48.901083488Z"}
Nov 11 16:12:48   {"msg":"2024/11/11 21:12:48.901114 [talos] service[syslogd](Finished): Service finished successfully","talos-level":"info","talos-service":"machined","talos-time":"2024-11-11T21:12:48.901131601Z"}
Nov 11 16:12:48   {"msg":"2024/11/11 21:12:48.901114 [talos] service[syslogd](Finished): Service finished successfully","talos-level":"info","talos-service":"machined","talos-time":"2024-11-11T21:12:48.901144857Z"}
Nov 11 16:12:48   {"component":"controller-runtime","controller":"block.LVMActivationController","msg":"2024-11-11T21:12:48.901Z \u001b[35mDEBUG\u001b[0m udevd service not registered yet","talos-level":"info","talos-service":"controller-runtime","talos-time":"2024-11-11T21:12:48.901145739Z"}
Nov 11 16:12:48   {"clock":530579425,"facility":"user","msg":"[talos] task install (1/1): failed: task \"upgrade\" failed: exit code 1\n","priority":"warning","seq":1919,"talos-level":"warn","talos-time":"2024-11-11T21:12:47.457744564Z"}
Nov 11 16:12:48   {"clock":530579491,"facility":"user","msg":"[talos] phase install (2/12): failed\n","priority":"warning","seq":1920,"talos-level":"warn","talos-time":"2024-11-11T21:12:47.457810564Z"}

full logs https://gist.github.com/askedrelic/63395ba87579127d5cd309825637d4bd

controlplane.yml https://gist.github.com/askedrelic/2f31b85d732c33e8bfe01f19cde6675e

Environment

smira commented 1 week ago

I can see the problem here, it's a bug, but the workaround is to wipe sda before the install process.

Looks like sda already contains Talos installation which leads to a race condition, which is a bug on its own.

smira commented 1 week ago

The root cause is that the partition table on the disk has a layout which go-blockdevice/v2 doesn't handle updating today correctly - the first pre-existing partition is bigger than new partition. A proper fix would be to use a different code path when completely wiping the partition table (which happens on install vs. updating existing pt).

askedrelic commented 1 week ago

Thanks for confirming; I felt like installing Talos on top of a mounted disk could cause issues, but is our preferred install method due to several constraints and would be great to be more supported.

Is continuing to use wipe:true recommended? Would wipe:false help at all?

smira commented 1 week ago

It's a bug which is going to be fixed and backported.

At the moment, there's no workaround on Talos side except for wiping the disk before installation.