oxidecomputer / propolis

VMM userspace for illumos bhyve
Mozilla Public License 2.0
176 stars 22 forks source link

Ubuntu 22.04 guest: "segfault at 10 ip 00007f68a0fd5b41 sp 00007ffc956aa800 error 6 in libc.so.6" during first boot #427

Closed askfongjojo closed 1 year ago

askfongjojo commented 1 year ago

The issue is sporadic - it only happened on one of the 13 identical 4 vcpu / 16 Gb instances I created using the same jammy cloud image (but several other instances failed at different points of guest initialization - will have other tickets for each of the failure modes).

Here is the console log as seen in the rack2 Console UI (https://recovery.sys.rack2.eng.oxide.computer/projects/try/instances/sysbench-mysql-3/serial-console)

BdsDxe: loading Boot0001 "UEFI " from PciRoot(0x0)/Pci(0x10,0x0)/NVMe(0x1,00-00-00-00-00-00-00-00)
BdsDxe: starting Boot0001 "UEFI " from PciRoot(0x0)/Pci(0x10,0x0)/NVMe(0x1,00-00-00-00-00-00-00-00)
[    0.000000] Linux version 5.15.0-71-generic (buildd@lcy02-amd64-044) (gcc (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023 (Ubuntu 5.15.0-71.78-generic 5.15.92)
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.15.0-71-generic root=LABEL=cloudimg-rootfs ro console=tty1 console=ttyS0
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000]   Hygon HygonGenuine
[    0.000000]   Centaur CentaurHauls
[    0.000000]   zhaoxin   Shanghai  
[    0.000000] [Firmware Bug]: TSC doesn't count with P0 frequency!
[    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[    0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
[    0.000000] signal: max sigframe size: 1776
[    0.000000] BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bea37fff] usable
[    0.000000] BIOS-e820: [mem 0x00000000bea38000-0x00000000bed37fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000bed38000-0x00000000bf8eefff] usable
[    0.000000] BIOS-e820: [mem 0x00000000bf8ef000-0x00000000bfb6efff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000bfb6f000-0x00000000bfb7efff] ACPI data
[    0.000000] BIOS-e820: [mem 0x00000000bfb7f000-0x00000000bfbfefff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000bfbff000-0x00000000bffdffff] usable
[    0.000000] BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000043fffffff] usable
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] efi: EFI v2.70 by EDK II
[    0.000000] efi: ACPI=0xbfb7e000 ACPI 2.0=0xbfb7e014 MEMATTR=0xbdf98518 MOKvar=0xbf9a8000 
[    0.000000] secureboot: Secure boot disabled
[    0.000000] DMI not present or invalid.
[    0.000000] tsc: Fast TSC calibration using PIT
[    0.000000] tsc: Detected 1996.164 MHz processor
[    0.000163] last_pfn = 0x440000 max_arch_pfn = 0x400000000
[    0.000274] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WP  UC- WT  
[    0.000290] last_pfn = 0xbffe0 max_arch_pfn = 0x400000000
[    0.004493] Using GB pages for direct mapping
[    0.004691] secureboot: Secure boot disabled
[    0.004692] RAMDISK: [mem 0xba0ee000-0xbbfbdfff]
[    0.004695] ACPI: Early table checksum verification disabled
[    0.004698] ACPI: RSDP 0x00000000BFB7E014 000024 (v02 OVMF  )
[    0.004702] ACPI: XSDT 0x00000000BFB7D0E8 000044 (v01 OVMF   OVMFEDK2 20130221      01000013)
[    0.004707] ACPI: FACP 0x00000000BFB7C000 0000F4 (v03 OVMF   OVMFEDK2 20130221 OVMF 00000099)
[    0.004713] ACPI: DSDT 0x00000000BFB7A000 000CBD (v01 INTEL  OVMF     00000004 INTL 20180629)
[    0.004716] ACPI: FACS 0x00000000BFBFE000 000040
[    0.004719] ACPI: APIC 0x00000000BFB7B000 000090 (v01 OVMF   OVMFEDK2 20130221 OVMF 00000099)
[    0.004722] ACPI: SSDT 0x00000000BFB79000 000057 (v01 REDHAT OVMF     00000001 INTL 20180629)
[    0.004725] ACPI: BGRT 0x00000000BFB78000 000038 (v01 INTEL  EDK2     00000002      01000013)
[    0.004727] ACPI: Reserving FACP table memory at [mem 0xbfb7c000-0xbfb7c0f3]
[    0.004729] ACPI: Reserving DSDT table memory at [mem 0xbfb7a000-0xbfb7acbc]
[    0.004730] ACPI: Reserving FACS table memory at [mem 0xbfbfe000-0xbfbfe03f]
[    0.004731] ACPI: Reserving APIC table memory at [mem 0xbfb7b000-0xbfb7b08f]
[    0.004732] ACPI: Reserving SSDT table memory at [mem 0xbfb79000-0xbfb79056]
[    0.004733] ACPI: Reserving BGRT table memory at [mem 0xbfb78000-0xbfb78037]
[    0.005196] No NUMA configuration found
[    0.005198] Faking a node at [mem 0x0000000000000000-0x000000043fffffff]
[    0.005204] NODE_DATA(0) allocated [mem 0x43ffd4000-0x43fffdfff]
[    0.005666] Zone ranges:
[    0.005667]   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
[    0.005669]   DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
[    0.005670]   Normal   [mem 0x0000000100000000-0x000000043fffffff]
[    0.005672]   Device   empty
[    0.005673] Movable zone start for each node
[    0.005675] Early memory node ranges
[    0.005676]   node   0: [mem 0x0000000000001000-0x000000000009ffff]
[    0.005677]   node   0: [mem 0x0000000000100000-0x00000000bea37fff]
[    0.005678]   node   0: [mem 0x00000000bed38000-0x00000000bf8eefff]
[    0.005679]   node   0: [mem 0x00000000bfbff000-0x00000000bffdffff]
[    0.005680]   node   0: [mem 0x0000000100000000-0x000000043fffffff]
[    0.005682] Initmem setup node 0 [mem 0x0000000000001000-0x000000043fffffff]
[    0.005698] On node 0, zone DMA: 1 pages in unavailable ranges
[    0.005911] On node 0, zone DMA: 96 pages in unavailable ranges
[    0.048059] On node 0, zone DMA32: 768 pages in unavailable ranges
[    0.048163] On node 0, zone DMA32: 784 pages in unavailable ranges
[    0.237006] On node 0, zone Normal: 32 pages in unavailable ranges
[    0.237551] ACPI: PM-Timer IO Port: 0xb008
[    0.237565] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
[    0.237604] IOAPIC[0]: apic_id 4, version 17, address 0xfec00000, GSI 0-31
[    0.237607] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.237609] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[    0.237611] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[    0.237612] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[    0.237614] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[    0.237617] ACPI: Using ACPI (MADT) for SMP configuration information
[    0.237652] smpboot: Allowing 4 CPUs, 0 hotplug CPUs
[    0.237665] PM: hibernation: Registered nosave memory: [mem 0x00000000-0x00000fff]
[    0.237668] PM: hibernation: Registered nosave memory: [mem 0x000a0000-0x000fffff]
[    0.237669] PM: hibernation: Registered nosave memory: [mem 0xbdfa2000-0xbdfaafff]
[    0.237671] PM: hibernation: Registered nosave memory: [mem 0xbea38000-0xbed37fff]
[    0.237672] PM: hibernation: Registered nosave memory: [mem 0xbf8ef000-0xbfb6efff]
[    0.237673] PM: hibernation: Registered nosave memory: [mem 0xbfb6f000-0xbfb7efff]
[    0.237674] PM: hibernation: Registered nosave memory: [mem 0xbfb7f000-0xbfbfefff]
[    0.237676] PM: hibernation: Registered nosave memory: [mem 0xbffe0000-0xbfffffff]
[    0.237676] PM: hibernation: Registered nosave memory: [mem 0xc0000000-0xffffffff]
[    0.237679] [mem 0xc0000000-0xffffffff] available for PCI devices
[    0.237680] Booting paravirtualized kernel on bare hardware
[    0.237684] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
[    0.237693] setup_percpu: NR_CPUS:8192 nr_cpumask_bits:4 nr_cpu_ids:4 nr_node_ids:1
[    0.239474] percpu: Embedded 60 pages/cpu s208896 r8192 d28672 u524288
[    0.239511] Built 1 zonelists, mobility grouping on.  Total pages: 4124935
[    0.239513] Policy zone: Normal
[    0.239514] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.15.0-71-generic root=LABEL=cloudimg-rootfs ro console=tty1 console=ttyS0
[    0.239574] Unknown kernel command line parameters "BOOT_IMAGE=/boot/vmlinuz-5.15.0-71-generic", will be passed to user space.
[    0.253473] Dentry cache hash table entries: 2097152 (order: 12, 16777216 bytes, linear)
[    0.260354] Inode-cache hash table entries: 1048576 (order: 11, 8388608 bytes, linear)
[    0.260389] mem auto-init: stack:off, heap alloc:on, heap free:off
[    0.346471] Memory: 16299740K/16770492K available (16393K kernel code, 4383K rwdata, 10840K rodata, 3244K init, 6548K bss, 470492K reserved, 0K cma-reserved)
[    0.346776] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[    0.346806] ftrace: allocating 50600 entries in 198 pages
[    0.369026] ftrace: allocated 198 pages with 4 groups
[    0.369380] rcu: Hierarchical RCU implementation.
[    0.369382] rcu:     RCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=4.
[    0.369384]  Rude variant of Tasks RCU enabled.
[    0.369384]  Tracing variant of Tasks RCU enabled.
[    0.369386] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
[    0.369387] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
[    0.373064] NR_IRQS: 524544, nr_irqs: 592, preallocated irqs: 16
[    0.373292] random: crng init done
[    0.373318] Console: colour dummy device 80x25
[    0.373435] printk: console [tty1] enabled
[    0.465881] printk: console [ttyS0] enabled
[    0.466440] ACPI: Core revision 20210730
[    0.467058] APIC: Switch to symmetric I/O mode setup
[    0.469838] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    0.487027] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x398c0cf3403, max_idle_ns: 881590421240 ns
[    0.488279] Calibrating delay loop (skipped), value calculated using timer frequency.. 3992.32 BogoMIPS (lpj=7984656)
[    0.489526] pid_max: default: 32768 minimum: 301
[    0.492663] LSM: Security Framework initializing
[    0.493242] landlock: Up and running.
[    0.493685] Yama: becoming mindful.
[    0.494164] AppArmor: AppArmor initialized
[    0.494908] Mount-cache hash table entries: 32768 (order: 6, 262144 bytes, linear)
[    0.496018] Mountpoint-cache hash table entries: 32768 (order: 6, 262144 bytes, linear)
[    0.497001] Last level iTLB entries: 4KB 512, 2MB 512, 4MB 256
[    0.497865] Last level dTLB entries: 4KB 2048, 2MB 2048, 4MB 1024, 1GB 0
[    0.498855] Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization
[    0.500279] Spectre V2 : Mitigation: Retpolines
[    0.500959] Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch
[    0.502184] Spectre V2 : Spectre v2 / SpectreRSB : Filling RSB on VMEXIT
[    0.503179] Speculative Store Bypass: Vulnerable
[    0.530764] Freeing SMP alternatives memory: 44K
[    0.639634] smpboot: CPU0: AMD EPYC 7713P 64-Core Processor (family: 0x19, model: 0x1, stepping: 0x1)
[    0.640273] Performance Events: PMU not available due to virtualization, using software events only.
[    0.640273] rcu: Hierarchical SRCU implementation.
[    0.640273] NMI watchdog: Perf NMI watchdog permanently disabled
[    0.640445] smp: Bringing up secondary CPUs ...
[    0.641372] x86: Booting SMP configuration:
[    0.641889] .... node  #0, CPUs:      #1
[    0.094987] smpboot: CPU 1 Converting physical 0 to logical die 1
[    0.724591]  #2
[    0.094987] smpboot: CPU 2 Converting physical 0 to logical die 2
[    0.808273] TSC synchronization [CPU#0 -> CPU#2]:
[    0.808273] Measured 220 cycles TSC warp between CPUs, turning off TSC clock.
[    0.808273] tsc: Marking TSC unstable due to check_tsc_sync_source failed
[    0.808541]  #3
[    0.094987] smpboot: CPU 3 Converting physical 0 to logical die 3
[    0.892367] smp: Brought up 1 node, 4 CPUs
[    0.892869] smpboot: Max logical packages: 4
[    0.893412] smpboot: Total of 4 processors activated (15969.33 BogoMIPS)
[    0.894852] devtmpfs: initialized
[    0.894852] x86/mm: Memory block size: 128MB
[    0.897592] ACPI: PM: Registering ACPI NVS region [mem 0xbfb7f000-0xbfbfefff] (524288 bytes)
[    0.897592] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[    0.900338] futex hash table entries: 1024 (order: 4, 65536 bytes, linear)
[    0.901240] pinctrl core: initialized pinctrl subsystem
[    0.902054] PM: RTC time: 22:17:07, date: 2023-06-03
[    0.902957] NET: Registered PF_NETLINK/PF_ROUTE protocol family
[    0.905623] DMA: preallocated 2048 KiB GFP_KERNEL pool for atomic allocations
[    0.908220] DMA: preallocated 2048 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
[    0.908432] DMA: preallocated 2048 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
[    0.909391] audit: initializing netlink subsys (disabled)
[    0.910091] audit: type=2000 audit(1685830626.444:1): state=initialized audit_enabled=0 res=1
[    0.910091] thermal_sys: Registered thermal governor 'fair_share'
[    0.912277] thermal_sys: Registered thermal governor 'bang_bang'
[    0.913009] thermal_sys: Registered thermal governor 'step_wise'
[    0.913720] thermal_sys: Registered thermal governor 'user_space'
[    0.914439] thermal_sys: Registered thermal governor 'power_allocator'
[    0.915165] EISA bus registered
[    0.916284] cpuidle: using governor ladder
[    0.916780] cpuidle: using governor menu
[    0.918039] ACPI: bus type PCI registered
[    0.918039] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[    0.918039] PCI: Using configuration type 1 for base access
[    0.920291] PCI: Using configuration type 1 for extended access
[    0.922391] Kprobes globally optimized
[    0.922898] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
[    0.924291] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
[    0.928412] fbcon: Taking over console
[    0.932308] ACPI: Added _OSI(Module Device)
[    0.932813] ACPI: Added _OSI(Processor Device)
[    0.933346] ACPI: Added _OSI(3.0 _SCP Extensions)
[    0.933907] ACPI: Added _OSI(Processor Aggregator Device)
[    0.934550] ACPI: Added _OSI(Linux-Dell-Video)
[    0.935083] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
[    0.935713] ACPI: Added _OSI(Linux-HPI-Hybrid-Graphics)
[    0.936706] ACPI: 2 ACPI AML tables successfully acquired and loaded
[    0.937653] ACPI Error: Could not enable GlobalLock event (20210730/evxfevnt-182)
[    0.938542] ACPI Warning: Could not enable fixed event - GlobalLock (1) (20210730/evxface-618)
[    0.939560] ACPI Error: No response from Global Lock hardware, disabling lock (20210730/evglock-59)
[    0.940399] ACPI: Interpreter enabled
[    0.940862] ACPI: PM: (supports S0 S3 S4 S5)
[    0.941395] ACPI: Using IOAPIC for interrupt routing
[    0.942005] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[    0.943083] PCI: Using E820 reservations for host bridge windows
[    0.945495] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[    0.946245] acpi PNP0A03:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI EDR HPX-Type3]
[    0.947500] PCI host bridge to bus 0000:00
[    0.948283] pci_bus 0000:00: root bus resource [bus 00-ff]
[    0.948939] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
[    0.949742] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff window]
[    0.950582] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
[    0.951466] pci_bus 0000:00: root bus resource [mem 0xc0000000-0xfeefffff window]
[    0.952535] pci 0000:00:00.0: [8086:1237] type 00 class 0x060000
[    0.954242] pci 0000:00:01.0: [8086:7000] type 00 class 0x060100
[    0.955743] * Found PM-Timer Bug on the chipset. Due to workarounds for a bug,
[    0.955743] * this clock source is slow. Consider trying other clock sources
[    0.956277] pci 0000:00:01.3: [8086:7113] type 00 class 0x068000
[    0.957592] pci 0000:00:01.3: quirk: [io  0xb000-0xb03f] claimed by PIIX4 ACPI
[    0.963410] pci 0000:00:08.0: [1af4:1000] type 00 class 0x020000
[    0.964304] pci 0000:00:08.0: reg 0x10: [io  0xc200-0xc3ff]
[    0.965049] pci 0000:00:08.0: reg 0x14: [mem 0xc000a000-0xc000bfff]
[    0.970250] pci 0000:00:10.0: [01de:0000] type 00 class 0x010802
[    0.971153] pci 0000:00:10.0: reg 0x10: [mem 0x800000000-0x800003fff 64bit]
[    0.972233] pci 0000:00:10.0: reg 0x20: [mem 0xc0000000-0xc0007fff]
[    0.977136] pci 0000:00:18.0: [1af4:1001] type 00 class 0x010000
[    0.978025] pci 0000:00:18.0: reg 0x10: [io  0xc000-0xc1ff]
[    0.978779] pci 0000:00:18.0: reg 0x14: [mem 0xc0008000-0xc0009fff]
[    0.983864] ACPI: PCI: Interrupt link LNKS configured for IRQ 9
[    0.984361] ACPI: PCI: Interrupt link LNKA configured for IRQ 10
[    0.985211] ACPI: PCI: Interrupt link LNKB configured for IRQ 10
[    0.986039] ACPI: PCI: Interrupt link LNKC configured for IRQ 11
[    0.986856] ACPI: PCI: Interrupt link LNKD configured for IRQ 11
[    0.987828] iommu: Default domain type: Translated 
[    0.988278] iommu: DMA domain TLB invalidation policy: lazy mode 
[    0.989195] SCSI subsystem initialized
[    0.989713] vgaarb: loaded
[    0.989713] ACPI: bus type USB registered
[    0.989713] usbcore: registered new interface driver usbfs
[    0.989847] usbcore: registered new interface driver hub
[    0.992282] usbcore: registered new device driver usb
[    0.992905] pps_core: LinuxPPS API ver. 1 registered
[    0.993529] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[    0.994628] PTP clock support registered
[    0.995172] EDAC MC: Ver: 3.0.0
[    0.996356] Registered efivars operations
[    0.996978] NetLabel: Initializing
[    0.996978] NetLabel:  domain hash size = 128
[    0.997491] NetLabel:  protocols = UNLABELED CIPSOv4 CALIPSO
[    0.998213] NetLabel:  unlabeled traffic allowed by default
[    0.998923] PCI: Using ACPI for IRQ routing
[    1.000343] pci 0000:00:10.0: can't claim BAR 0 [mem 0x800000000-0x800003fff 64bit]: no compatible bridge window
[    1.001713] clocksource: Switched to clocksource refined-jiffies
[    1.023118] VFS: Disk quotas dquot_6.6.0
[    1.023622] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    1.024462] AppArmor: AppArmor Filesystem Enabled
[    1.025049] pnp: PnP ACPI init
[    1.025623] system 00:01: [io  0x01e0-0x01ef] has been reserved
[    1.026335] system 00:01: [io  0x0160-0x016f] has been reserved
[    1.027039] system 00:01: [io  0x0370-0x0371] has been reserved
[    1.027757] system 00:01: [io  0x0402] has been reserved
[    1.028279] system 00:01: [io  0x0440-0x044f] has been reserved
[    1.029000] system 00:01: [io  0xafe0-0xafe3] has been reserved
[    1.029707] system 00:01: [io  0xb000-0xb03f] has been reserved
[    1.030421] system 00:01: [mem 0xfec00000-0xfec00fff] could not be reserved
[    1.031252] system 00:01: [mem 0xfee00000-0xfeefffff] has been reserved
[    1.032332] pnp: PnP ACPI: found 5 devices
[    1.042952] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[    1.044108] clocksource: Switched to clocksource acpi_pm
[    1.044108] NET: Registered PF_INET protocol family
[    1.044108] IP idents hash table entries: 262144 (order: 9, 2097152 bytes, linear)
[    1.046759] tcp_listen_portaddr_hash hash table entries: 8192 (order: 5, 131072 bytes, linear)
[    1.048010] Table-perturb hash table entries: 65536 (order: 6, 262144 bytes, linear)
[    1.049892] TCP established hash table entries: 131072 (order: 8, 1048576 bytes, linear)
[    1.051803] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes, linear)
[    1.052804] TCP: Hash tables configured (established 131072 bind 65536)
[    1.054177] MPTCP token hash table entries: 16384 (order: 6, 393216 bytes, linear)
[    1.055319] UDP hash table entries: 8192 (order: 6, 262144 bytes, linear)
[    1.056426] UDP-Lite hash table entries: 8192 (order: 6, 262144 bytes, linear)
[    1.057339] NET: Registered PF_UNIX/PF_LOCAL protocol family
[    1.058027] NET: Registered PF_XDP protocol family
[    1.058609] pci 0000:00:10.0: BAR 0: assigned [mem 0xc000c000-0xc000ffff 64bit]
[    1.059559] pci_bus 0000:00: resource 4 [io  0x0000-0x0cf7 window]
[    1.060351] pci_bus 0000:00: resource 5 [io  0x0d00-0xffff window]
[    1.061085] pci_bus 0000:00: resource 6 [mem 0x000a0000-0x000bffff window]
[    1.061902] pci_bus 0000:00: resource 7 [mem 0xc0000000-0xfeefffff window]
[    1.062768] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
[    1.063509] pci 0000:00:01.0: Activating ISA DMA hang workarounds
[    1.064369] PCI: CLS 0 bytes, default 64
[    1.064937] Trying to unpack rootfs image as initramfs...
[    1.080399] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[    1.081199] software IO TLB: mapped [mem 0x00000000b60ee000-0x00000000ba0ee000] (64MB)
[    1.083176] Initialise system trusted keyrings
[    1.083763] Key type blacklist registered
[    1.084587] workingset: timestamp_bits=36 max_order=22 bucket_order=0
[    1.086719] zbud: loaded
[    1.087309] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[    1.088590] fuse: init (API version 7.34)
[    1.089697] integrity: Platform Keyring initialized
[    1.094199] Key type asymmetric registered
[    1.094739] Asymmetric key parser 'x509' registered
[    1.095490] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 243)
[    1.096823] io scheduler mq-deadline registered
[    1.097988] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
[    1.098976] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
[    1.099972] ACPI: button: Power Button [PWRF]
[    1.101461] ACPI: \_SB_.PCI0.LPC_.LNKA: Enabled at IRQ 10
[    1.102226] virtio-pci 0000:00:08.0: virtio_pci: leaving for legacy driver
[    1.103528] virtio-pci 0000:00:18.0: can't derive routing for PCI INT B
[    1.104463] virtio-pci 0000:00:18.0: PCI INT B: no GSI - using ISA IRQ 10
[    1.105428] virtio-pci 0000:00:18.0: virtio_pci: leaving for legacy driver
[    1.106565] Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled
[    1.107559] 00:03: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16450
[    1.108897] 00:04: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 115200) is a 16450
[    1.110164] serial8250: ttyS2 at I/O 0x3e8 (irq = 4, base_baud = 115200) is a 16450
[    1.115889] Linux agpgart interface v0.103
[    1.121024] loop: module loaded
[    1.121844] tun: Universal TUN/TAP device driver, 1.6
[    1.122606] PPP generic driver version 2.4.2
[    1.123389] VFIO - User Level meta-driver version: 0.3
[    1.124404] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    1.125249] ehci-pci: EHCI PCI platform driver
[    1.125834] ehci-platform: EHCI generic platform driver
[    1.126501] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[    1.127279] ohci-pci: OHCI PCI platform driver
[    1.127861] ohci-platform: OHCI generic platform driver
[    1.128617] uhci_hcd: USB Universal Host Controller Interface driver
[    1.129484] i8042: PNP: PS/2 Controller [PNP0303:PS2K] at 0x60,0x64 irq 1
[    1.130348] i8042: PNP: PS/2 appears to have AUX port disabled, if this is incorrect please boot with i8042.nopnp
[    1.131851] serio: i8042 KBD port at 0x60,0x64 irq 1
[    1.132910] mousedev: PS/2 mouse device common for all mice
[    1.133974] rtc_cmos 00:00: RTC can wake from S4
[    1.135109] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input1
[    1.136366] rtc_cmos 00:00: registered as rtc0
[    1.137061] rtc_cmos 00:00: setting system clock to 2023-06-03T22:17:07 UTC (1685830627)
[    1.138125] ACPI Error: Could not enable RealTimeClock event (20210730/evxfevnt-182)
[    1.139097] ACPI Warning: Could not enable fixed event - RealTimeClock (4) (20210730/evxface-618)
[    1.140229] rtc_cmos 00:00: alarms up to one day, 114 bytes nvram
[    1.141095] i2c_dev: i2c /dev entries driver
[    1.141666] device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log.
[    1.143204] device-mapper: uevent: version 1.0.3
[    1.144055] device-mapper: ioctl: 4.45.0-ioctl (2021-03-22) initialised: dm-devel@redhat.com
[    1.145197] platform eisa.0: Probing EISA bus 0
[    1.145777] platform eisa.0: EISA: Cannot allocate resource for mainboard
[    1.146624] platform eisa.0: Cannot allocate resource for EISA slot 1
[    1.147438] platform eisa.0: Cannot allocate resource for EISA slot 2
[    1.148253] platform eisa.0: Cannot allocate resource for EISA slot 3
[    1.149145] platform eisa.0: Cannot allocate resource for EISA slot 4
[    1.149956] platform eisa.0: Cannot allocate resource for EISA slot 5
[    1.150784] platform eisa.0: Cannot allocate resource for EISA slot 6
[    1.151584] platform eisa.0: Cannot allocate resource for EISA slot 7
[    1.152462] platform eisa.0: Cannot allocate resource for EISA slot 8
[    1.153266] platform eisa.0: EISA: Detected 0 cards
[    1.154328] ledtrig-cpu: registered to indicate activity on CPUs
[    1.155114] efifb: probing for efifb
[    1.155600] efifb: framebuffer at 0xbea38000, using 1876k, total 1875k
[    1.156488] efifb: mode is 800x600x32, linelength=3200, pages=1
[    1.157238] efifb: scrolling: redraw
[    1.157703] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
[    1.158630] Console: switching to colour frame buffer device 100x37
[    1.160028] fb0: EFI VGA frame buffer device
[    1.160656] EFI Variables Facility v0.08 2004-May-17
[    1.165511] drop_monitor: Initializing network drop monitor service
[    1.166440] NET: Registered PF_INET6 protocol family
[    1.317227] Freeing initrd memory: 31552K
[    1.325326] Segment Routing with IPv6
[    1.325840] In-situ OAM (IOAM) with IPv6
[    1.326396] NET: Registered PF_PACKET protocol family
[    1.327172] Key type dns_resolver registered
[    1.328507] IPI shorthand broadcast: enabled
[    1.329291] registered taskstats version 1
[    1.330103] Loading compiled-in X.509 certificates
[    1.331793] Loaded X.509 cert 'Build time autogenerated kernel key: 2f86ddc308e15dc6b50c79b07e2324bbca0a5704'
[    1.334025] Loaded X.509 cert 'Canonical Ltd. Live Patch Signing: 14df34d1a87cf37625abec039ef2bf521249b969'
[    1.336154] Loaded X.509 cert 'Canonical Ltd. Kernel Module Signing: 88f752e560a1e0737e31163a466ad7b70a850c19'
[    1.337999] blacklist: Loading compiled-in revocation X.509 certificates
[    1.339120] Loaded X.509 cert 'Canonical Ltd. Secure Boot Signing: 61482aa2830d0ab2ad5af10b7250da9033ddcef0'
[    1.340977] Loaded X.509 cert 'Canonical Ltd. Secure Boot Signing (2017): 242ade75ac4a15e50d50c84b0d45ff3eae707a03'
[    1.342881] Loaded X.509 cert 'Canonical Ltd. Secure Boot Signing (ESM 2018): 365188c1d374d6b07c3c8f240f8ef722433d6a8b'
[    1.344929] Loaded X.509 cert 'Canonical Ltd. Secure Boot Signing (2019): c0746fd6c5da3ae827864651ad66ae47fe24b3e8'
[    1.346950] Loaded X.509 cert 'Canonical Ltd. Secure Boot Signing (2021 v1): a8d54bbb3825cfb94fa13c9f8a594a195c107b8d'
[    1.349108] Loaded X.509 cert 'Canonical Ltd. Secure Boot Signing (2021 v2): 4cf046892d6fd3c9a5b03f98d845f90851dc6a8c'
[    1.351235] Loaded X.509 cert 'Canonical Ltd. Secure Boot Signing (2021 v3): 100437bb6de6e469b581e61cd66bce3ef4ed53af'
[    1.353449] Loaded X.509 cert 'Canonical Ltd. Secure Boot Signing (Ubuntu Core 2019): c1d57b8f6b743f23ee41f4f7ee292f06eecadfb9'
[    1.356744] zswap: loaded using pool lzo/zbud
[    1.358856] Key type .fscrypt registered
[    1.359797] Key type fscrypt-provisioning registered
[    1.365374] Key type encrypted registered
[    1.366318] AppArmor: AppArmor sha1 policy hashing enabled
[    1.367700] integrity: Loading X.509 certificate: UEFI:MokListRT (MOKvar table)
[    1.369755] integrity: Loaded X.509 cert 'Canonical Ltd. Master Certificate Authority: ad91990bc22ab1f517048c23b6655a268e345a63'
[    1.372049] ima: No TPM chip found, activating TPM-bypass!
[    1.373256] Loading compiled-in module X.509 certificates
[    1.374781] Loaded X.509 cert 'Build time autogenerated kernel key: 2f86ddc308e15dc6b50c79b07e2324bbca0a5704'
[    1.376992] ima: Allocated hash algorithm: sha1
[    1.378028] ima: No architecture policies found
[    1.379058] evm: Initialising EVM extended attributes:
[    1.380145] evm: security.selinux
[    1.381079] evm: security.SMACK64
[    1.381927] evm: security.SMACK64EXEC
[    1.382802] evm: security.SMACK64TRANSMUTE
[    1.383716] evm: security.SMACK64MMAP
[    1.384646] evm: security.apparmor
[    1.385465] evm: security.ima
[    1.386220] evm: security.capability
[    1.387032] evm: HMAC attrs: 0x1
[    1.388393] PM:   Magic number: 11:279:299
[    1.389740] RAS: Correctable Errors collector initialized.
[    1.390764] Unstable clock detected, switching default tracing clock to "global"
[    1.390764] If you want to keep using the local clock, then add:
[    1.390764]   "trace_clock=local"
[    1.390764] on the kernel command line
[    1.396020] Freeing unused decrypted memory: 2036K
[    1.397512] Freeing unused kernel image (initmem) memory: 3244K
[    1.408531] Write protecting the kernel read-only data: 30720k
[    1.410160] Freeing unused kernel image (text/rodata gap) memory: 2036K
[    1.411571] Freeing unused kernel image (rodata/data gap) memory: 1448K
[    1.444510] x86/mm: Checked W+X mappings: passed, no W+X pages found.
[    1.445567] Run /init as init process
Loading, please wait...
Starting version 249.11-0ubuntu3.9
[    1.538165] virtio_blk virtio1: [vda] 42 512-byte logical blocks (21.5 kB/21.0 KiB)
[    1.539865] cryptd: max_cpu_qlen set to 1000
[    1.540715]  vda:
[    1.546972] AVX2 version of gcm_enc/dec engaged.
[    1.551108] AES CTR mode by8 optimization enabled
[    1.556547] nvme nvme0: pci function 0000:00:10.0
[    1.575228] nvme nvme0: 4/0/0 default/read/poll queues
[    1.592577] virtio_net virtio0 enp0s8: renamed from eth0
[    1.601603] GPT:Primary header thinks Alt. header is not at the end of the disk.
[    1.602847] GPT:4612095 != 209715199
[    1.603548] GPT:Alternate GPT header not at the end of the disk.
[    1.604610] GPT:4612095 != 209715199
[    1.605321] GPT: Use GNU Parted to correct GPT errors.
[    1.606225]  nvme0n1: p1 p14 p15
Begin: Loading essential drivers ... [    3.116339] raid6: avx2x4   gen() 27204 MB/s
[    3.184340] raid6: avx2x4   xor()  4454 MB/s
[    3.252338] raid6: avx2x2   gen() 27177 MB/s
[    3.320341] raid6: avx2x2   xor() 27564 MB/s
[    3.388339] raid6: avx2x1   gen() 19519 MB/s
[    3.456340] raid6: avx2x1   xor() 22111 MB/s
[    3.524340] raid6: sse2x4   gen() 18447 MB/s
[    3.592342] raid6: sse2x4   xor()  1585 MB/s
[    3.660339] raid6: sse2x2   gen() 14912 MB/s
[    3.728338] raid6: sse2x2   xor() 14566 MB/s
[    3.796351] raid6: sse2x1   gen()   977 MB/s
[    3.864342] raid6: sse2x1   xor() 12322 MB/s
[    3.865115] raid6: using algorithm avx2x4 gen() 27204 MB/s
[    3.866014] raid6: .... xor() 4454 MB/s, rmw enabled
[    3.866844] raid6: using avx2x2 recovery algorithm
[    3.869241] xor: automatically using best checksumming function   avx       
[    3.871242] async_tx: api initialized (async)
done.
Begin: Running /scripts/init-premount ... done.
Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done.
Begin: Running /scripts/local-premount ... [    3.953886] Btrfs loaded, crc32c=crc32c-intel, zoned=yes, fsverity=yes
Scanning for Btrfs filesystems
done.
Warning: fsck not present, so skipping root file system
[    4.089066] EXT4-fs (nvme0n1p1): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
done.
Begin: Running /scripts/local-bottom ... done.
Begin: Running /scripts/init-bottom ... done.
[    4.676820] systemd[1]: Inserted module 'autofs4'
[    4.746800] systemd[1]: systemd 249.11-0ubuntu3.9 running in system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY -P11KIT -QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
[    4.751870] systemd[1]: Detected virtualization bhyve.
[    4.752933] systemd[1]: Detected architecture x86-64.

Welcome to Ubuntu 22.04.2 LTS!

[    4.757997] systemd[1]: Hostname set to <ubuntu>.
[    4.776135] systemd[1]: Initializing machine ID from random generator.
[    4.777405] systemd[1]: Installed transient /etc/machine-id file.
[    5.876814] systemd[1]: Queued start job for default target Graphical Interface.
[    5.882677] systemd[1]: Created slice Slice /system/modprobe.
[  OK  ] Created slice Slice /system/modprobe.
[    5.889825] systemd[1]: Created slice Slice /system/serial-getty.
[  OK  ] Created slice Slice /system/serial-getty.
[    5.895343] systemd[1]: Created slice Slice /system/systemd-fsck.
[  OK  ] Created slice Slice /system/systemd-fsck.
[    5.899869] systemd[1]: Created slice User and Session Slice.
[  OK  ] Created slice User and Session Slice.
[    5.904501] systemd[1]: Started Forward Password Requests to Wall Directory Watch.
[  OK  ] Started Forward Password R…uests to Wall Directory Watch.
[    5.910619] systemd[1]: Set up automount Arbitrary Executable File Formats File System Automount Point.
[  OK  ] Set up automount Arbitrary…s File System Automount Point.
[    5.918841] systemd[1]: Reached target Slice Units.
[  OK  ] Reached target Slice Units.
[    5.922895] systemd[1]: Reached target Mounting snaps.
[  OK  ] Reached target Mounting snaps.
[    5.927329] systemd[1]: Reached target Swaps.
[  OK  ] Reached target Swaps.
[    5.931101] systemd[1]: Reached target Local Verity Protected Volumes.
[  OK  ] Reached target Local Verity Protected Volumes.
[    5.936632] systemd[1]: Listening on Device-mapper event daemon FIFOs.
[  OK  ] Listening on Device-mapper event daemon FIFOs.
[    5.941911] systemd[1]: Listening on LVM2 poll daemon socket.
[  OK  ] Listening on LVM2 poll daemon socket.
[    5.946631] systemd[1]: Listening on multipathd control socket.
[  OK  ] Listening on multipathd control socket.
[    5.951472] systemd[1]: Listening on Syslog Socket.
[  OK  ] Listening on Syslog Socket.
[    5.956745] systemd[1]: Listening on fsck to fsckd communication Socket.
[  OK  ] Listening on fsck to fsckd communication Socket.
[    5.961466] systemd[1]: Listening on initctl Compatibility Named Pipe.
[  OK  ] Listening on initctl Compatibility Named Pipe.
[    5.966307] systemd[1]: Listening on Journal Audit Socket.
[  OK  ] Listening on Journal Audit Socket.
[    5.971593] systemd[1]: Listening on Journal Socket (/dev/log).
[  OK  ] Listening on Journal Socket (/dev/log).
[    5.976228] systemd[1]: Listening on Journal Socket.
[  OK  ] Listening on Journal Socket.
[    5.980824] systemd[1]: Listening on Network Service Netlink Socket.
[  OK  ] Listening on Network Service Netlink Socket.
[    5.987826] systemd[1]: Listening on udev Control Socket.
[  OK  ] Listening on udev Control Socket.
[    5.992078] systemd[1]: Listening on udev Kernel Socket.
[  OK  ] Listening on udev Kernel Socket.
[    5.997310] systemd[1]: Mounting Huge Pages File System...
         Mounting Huge Pages File System...
[    6.002024] systemd[1]: Mounting POSIX Message Queue File System...
         Mounting POSIX Message Queue File System...
[    6.007129] systemd[1]: Mounting Kernel Debug File System...
         Mounting Kernel Debug File System...
[    6.011301] systemd[1]: Mounting Kernel Trace File System...
         Mounting Kernel Trace File System...
[    6.017369] systemd[1]: Starting Journal Service...
         Starting Journal Service...
[    6.021335] systemd[1]: Starting Set the console keyboard layout...
         Starting Set the console keyboard layout...
[    6.025666] systemd[1]: Starting Create List of Static Device Nodes...
         Starting Create List of Static Device Nodes...
[    6.029926] systemd[1]: Starting Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling...
         Starting Monitoring of LVM…meventd or progress polling...
[    6.035125] systemd[1]: Condition check resulted in LXD - agent being skipped.
[    6.037425] systemd[1]: Starting Load Kernel Module chromeos_pstore...
         Starting Load Kernel Module chromeos_pstore...
[    6.041733] systemd[1]: Starting Load Kernel Module configfs...
         Starting Load Kernel Module configfs...
[    6.045892] systemd[1]: Starting Load Kernel Module drm...
         Starting Load Kernel Module drm...
[    6.050328] systemd[1]: Starting Load Kernel Module efi_pstore...
         Starting Load Kernel Module efi_pstore...
[    6.055083] systemd[1]: Starting Load Kernel Module fuse...
         Starting Load Kernel Module fuse...
[    6.059024] systemd[1]: Starting Load Kernel Module pstore_blk...
         Starting Load Kernel Module pstore_blk...
[    6.063424] systemd[1]: Starting Load Kernel Module pstore_zone...
         Starting Load Kernel Module pstore_zone...
[    6.068270] systemd[1]: Starting Load Kernel Module ramoops...
         Starting Load Kernel Module ramoops...
[    6.071450] systemd[1]: Condition check resulted in OpenVSwitch configuration for cleanup being skipped.
[    6.074651] systemd[1]: Starting File System Check on Root Device...
         Starting File System Check on Root Device...
[    6.090823] pstore: Using crash dump compression: deflate
[    6.090974] systemd[1]: Starting Load Kernel Modules...
[    6.091969] pstore: Registered efi as persistent store backend
         Starting Load Kernel Modules...[    6.095595] systemd[1]: Starting Coldplug All udev Devices...

         Starting Coldplug All udev Devices...
[    6.098945] systemd[1]: Mounted Huge Pages File System.
[ [    6.100156] systemd[1]: Mounted POSIX Message Queue File System.
 OK  ] Mounted     6.101525] systemd[1]: Mounted Kernel Debug File System.
9mHuge Pages File Sys[    6.102745] systemd[1]: Mounted Kernel Trace File System.
tem.
[  OK  ] Mounted POSIX Message Queue File S[    6.105176] systemd[1]: Finished Create List of Static Device Nodes.
ystem.
[  OK  ] Mounted Kernel[    6.107243] systemd[1]: Finished Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling.
 Debug File System.
[  OK  [    6.109551] systemd[1]: modprobe@chromeos_pstore.service: Deactivated successfully.
] Mounted Kernel Trace Fil[    6.111311] systemd[1]: Finished Load Kernel Module chromeos_pstore.
e System.
[  OK  ] [    6.112909] systemd[1]: modprobe@configfs.service: Deactivated successfully.
Finished Create List of Static[    6.114677] systemd[1]: Finished Load Kernel Module configfs.
 Device Nodes.
[  O[    6.116158] systemd[1]: modprobe@efi_pstore.service: Deactivated successfully.
K  ] Finished Monitoring o[    6.117948] systemd[1]: Finished Load Kernel Module efi_pstore.
f LVM… dmeventd or progress po[    6.119396] systemd[1]: modprobe@fuse.service: Deactivated successfully.
lling.
[  OK  ] Finis[    6.121082] systemd[1]: Finished Load Kernel Module fuse.
hed Load Kernel Module c[    6.122546] systemd[1]: modprobe@pstore_blk.service: Deactivated successfully.
hromeos_pstore.
[ [    6.124259] systemd[1]: Finished Load Kernel Module pstore_blk.
 OK  ] Finished Load Kernel M[    6.126032] systemd[1]: modprobe@pstore_zone.service: Deactivated successfully.
odule configfs.
[  OK  ] F[    6.127923] systemd[1]: Finished Load Kernel Module pstore_zone.
inished Load Kernel Module [    6.129610] systemd[1]: modprobe@ramoops.service: Deactivated successfully.
efi_pstore.
[  OK  ][    6.131437] systemd[1]: Finished Load Kernel Module ramoops.
 Finished Load Kernel Mo[    6.133032] systemd[1]: Started Journal Service.
dule fuse.
[  OK  ] Finished Load Kernel Module pstore_blk.
[  OK  ] Finished Load Kernel Module pstore_zone.
[  OK  ] Finished Load Kernel Module ramoops.
[  OK  ] Started Journal Service.
[  OK  ] Finished Load Kernel Modules.
         Mounting FUSE Control File System...
         Mounting Kernel Configuration File System...
[  OK  ] Started File System Check Daemon to report status.
         Starting Apply Kernel Variables...
[  OK  ] Mounted FUSE Control File System.
[  OK  ] Finished Load Kernel Module drm.
[  OK  ] Mounted Kernel Configuration File System.
[  OK  ] Finished Set the console keyboard layout.
[  OK  ] Finished Coldplug All udev Devices.
[  OK  ] Finished File System Check on Root Device.
         Starting Remount Root and Kernel File Systems...
[  OK  ] Finished Apply Kernel Variables.
[  OK  ] Finished Remount Root and Kernel File Systems.
         Starting Device-Mapper Multipath Device Controller...
         Starting Flush Journal to Persistent Storage...
         Starting Load/Save Random Seed...
         Starting Create System Users...
[  OK  ] Finished Load/Save Random Seed.
[  OK  ] Finished Create System Users.
         Starting Create Static Device Nodes in /dev...
[  OK  ] Finished Flush Journal to Persistent Storage.
[  OK  ] Finished Create Static Device Nodes in /dev.
         Starting Rule-based Manage…for Device Events and Files...
[    6.695669] systemd[1]: segfault at 10 ip 00007f68a0fd5b41 sp 00007ffc956aa800 error 6 in libc.so.6[7f68a0f5a000+195000]
[    6.699615] Code: 00 48 39 ce 0f 84 07 06 00 00 4c 8b 61 18 4a 83 4c 29 08 01 4c 39 cb 74 05 48 83 49 08 04 4c 8b 34 24 4c 89 62 08 41 83 c2 01 <49> 89 74 24 10 4c 89 71 18 4c 8d 71 10 4c 89 74 24 08 49 c1 ee 0c
[  116.498129] systemd-journald[376]: Failed to send WATCHDOG=1 notification message: Connection refused
[  226.497975] systemd-journald[376]: Failed to send WATCHDOG=1 notification message: Transport endpoint is not connected

The propolis zone is located in sled BRM42220014 (rack2, cubby 16). I've copied the propolis log file to: catacomb.eng.oxide.computer:/data/staff/dogfood/jun-03/system-illumos-propolis-server_vm-738d9902-01a8-4343-8593-c687a8782e69.log

askfongjojo commented 1 year ago

The guest boot issue is not limited to first boot (though it seems to happen more often, anedotally, 6-8% of the time). The guest also segfaulted when put in a reboot loop (and came up just fine in another subsequent reboot), e.g.

...
Jun 29 01:12:49 sysbench-mysql7 multipathd[419]: --------start up--------
Jun 29 01:12:49 sysbench-mysql7 multipathd[419]: read /etc/multipath.conf
Jun 29 01:12:49 sysbench-mysql7 multipathd[419]: path checkers start up
Jun 29 01:12:49 sysbench-mysql7 systemd[1]: Starting Rule-based Manager for Device Events and Files...
Jun 29 01:12:49 sysbench-mysql7 systemd-udevd[430]: systemd: malloc.c:4302: _int_malloc: Assertion `(unsigned long) (size) >= (unsigned long) (nb)' failed. 
Jun 29 01:12:49 sysbench-mysql7 systemd[1]: Started .
Jun 29 01:12:49 sysbench-mysql7 kernel: systemd[1]: segfault at b8 ip 000055f7af1e26b4 sp 00007fffe05712a0 error 4 in systemd[55f7af105000+e0000]
Jun 29 01:12:49 sysbench-mysql7 kernel: Code: 00 00 00 00 00 90 48 8b 8c 24 a0 00 00 00 48 8b 7c 24 70 31 d2 48 89 de e8 89 74 f2 ff 84 c0 0f 84 1f fd ff ff 48 8b 44 24 08 <48> 8b b8 b8 00 00 00 48 85 ff 74 d0 e8 5b 0d f6 ff 48 8b 44 24 08
Jun 29 01:12:49 sysbench-mysql7 systemd[1]: Caught <SEGV>, dumped core as pid 432.
Jun 29 01:12:49 sysbench-mysql7 systemd[1]: Freezing execution.
-- Boot bec512c090df44b586508f845e8fbe18 -- 
Jun 29 01:15:06 sysbench-mysql7 kernel: Linux version 5.15.0-71-generic (buildd@lcy02-amd64-044) (gcc (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023 (Ubuntu 5.15.0-71.78-generic 5.15.92)
Jun 29 01:15:06 sysbench-mysql7 kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-5.15.0-71-generic root=LABEL=cloudimg-rootfs ro console=tty1 console=ttyS0
...
...
Jun 29 01:42:12 sysbench-mysql7 systemd[1]: Starting Set console font and keymap...
Jun 29 01:42:12 sysbench-mysql7 systemd[1]: Starting Create final runtime dir for shutdown pivot root...
Jun 29 01:42:12 sysbench-mysql7 systemd[1]: Starting Tell Plymouth To Write Out Runtime Data...
Jun 29 01:42:12 sysbench-mysql7 systemd[1]: Condition check resulted in Set Up Additional Binary Formats being skipped.
Jun 29 01:42:12 sysbench-mysql7 systemd[1]: Condition check resulted in Store a System Token in an EFI Variable being skipped.
Jun 29 01:42:12 sysbench-mysql7 systemd[1]: Condition check resulted in Commit a transient machine-id on disk being skipped.
Jun 29 01:42:12 sysbench-mysql7 systemd[1]: Starting Create Volatile Files and Directories...
Jun 29 01:42:12 sysbench-mysql7 systemd[1]: Starting Uncomplicated firewall...
Jun 29 01:42:12 sysbench-mysql7 kernel: systemd-tmpfile[532]: segfault at 55c17df37000 ip 000055c17df37000 sp 00007ffca8325a58 error 15 in systemd-tmpfiles[55c17df37000+4000]
Jun 29 01:42:12 sysbench-mysql7 kernel: Code: Unable to access opcode bytes at RIP 0x55c17df36fd6.
Jun 29 01:42:12 sysbench-mysql7 systemd[1]: Finished Set console font and keymap.
...
...
Jun 29 01:53:58 sysbench-mysql7 systemd[1]: Starting Set console font and keymap...
Jun 29 01:53:58 sysbench-mysql7 systemd[1]: Starting Create final runtime dir for shutdown pivot root...
Jun 29 01:53:58 sysbench-mysql7 systemd[1]: Starting Tell Plymouth To Write Out Runtime Data...
Jun 29 01:53:58 sysbench-mysql7 systemd[1]: Condition check resulted in Set Up Additional Binary Formats being skipped.
Jun 29 01:53:58 sysbench-mysql7 systemd[1]: Condition check resulted in Store a System Token in an EFI Variable being skipped.
Jun 29 01:53:58 sysbench-mysql7 systemd[1]: Condition check resulted in Commit a transient machine-id on disk being skipped.
Jun 29 01:53:58 sysbench-mysql7 systemd[1]: Starting Create Volatile Files and Directories...
Jun 29 01:53:58 sysbench-mysql7 apparmor.systemd[527]: Restarting AppArmor 
Jun 29 01:53:58 sysbench-mysql7 apparmor.systemd[527]: Reloading AppArmor profiles
Jun 29 01:53:58 sysbench-mysql7 systemd[1]: Starting Uncomplicated firewall...
Jun 29 01:53:58 sysbench-mysql7 kernel: systemd-tmpfile[536]: segfault at 5624c56a8f83 ip 00005624840f7231 sp 00007ffe64e18420 error 4 in systemd-tmpfiles[5624840f0000+e000]
Jun 29 01:53:58 sysbench-mysql7 kernel: Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0f <84> 80 03 00 00 41 0f b6 44 24 01 45 31 c9 31 ff 45 31 c0 84 c0 0f
Jun 29 01:53:58 sysbench-mysql7 systemd[1]: Finished Set console font and keymap.
Jun 29 01:53:58 sysbench-mysql7 systemd[1]: Finished Create final runtime dir for shutdown pivot root.
Jun 29 01:53:58 sysbench-mysql7 systemd[1]: Finished Tell Plymouth To Write Out Runtime Data.
Jun 29 01:53:58 sysbench-mysql7 systemd[1]: systemd-tmpfiles-setup.service: Main process exited, code=killed, status=11/SEGV
Jun 29 01:53:58 sysbench-mysql7 systemd[1]: systemd-tmpfiles-setup.service: Failed with result 'signal'.
Jun 29 01:53:58 sysbench-mysql7 systemd[1]: Failed to start Create Volatile Files and Directories.
Jun 29 01:53:58 sysbench-mysql7 systemd[1]: Finished Uncomplicated firewall.
Jun 29 01:53:58 sysbench-mysql7 systemd[1]: Starting Network Time Synchronization...
...
pfmooney commented 1 year ago

Using propolis-standalone on commodity hardware, I've reproduced something similar to this a few times:

[    4.591020] systemd[1]: segfault at 10 ip 00007fc6465e0b41 sp 00007ffccd3249f0 error 6 in libc.so.6[7fc646565000+195000]
[    4.592577] Code: 00 48 39 ce 0f 84 07 06 00 00 4c 8b 61 18 4a 83 4c 29 08 01 4c 39 cb 74 05 48 83 49 08 04 4c 8b 34 24 4c 89 62 08 41 83 c2 01 <49> 89 74 24 10 4c 89 71 18 4c 8d 71 10 4c 89 74 24 08 49 c1 ee 0c
[    4.935831] systemd[1]: segfault at 2b ip 00007fb3c0868040 sp 00007ffe02870c38 error 6 in libsystemd-shared-249.so[7fb3c0766000+1a9000]
[    4.939153] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[    5.177656] systemd[1]: segfault at 0 ip 00007ff3ffa83089 sp 00007fff14f61820 error 4 in libsystemd-shared-249.so[7ff3ff9a8000+1a9000]
[    5.181300] Code: 48 89 44 24 38 31 c0 f6 47 2f 04 48 8d 05 6f a1 18 00 48 89 e5 48 0f 44 f0 48 89 ef e8 50 99 01 00 48 8b 03 48 89 ee 4c 89 e7 <ff> 10 48 89 ef e8 4d 9c 01 00 0f b6 53 2f f6 c2 04 74 24 8b 4b 24

For two of them (so far), I also followed up to check that the %rip of the fault corresponded with seemingly sensible program text:

[    6.695669] systemd[1]: segfault at 10 ip 00007f68a0fd5b41 sp 00007ffc956aa800 error 6 in libc.so.6[7f68a0f5a000+195000]
[    6.699615] Code: 00 48 39 ce 0f 84 07 06 00 00 4c 8b 61 18 4a 83 4c 29 08 01 4c 39 cb 74 05 48 83 49 08 04 4c 8b 34 24 4c 89 62 08 41 83 c2 01 <49> 89 74 24 10 4c 89 71 18 4c 8d 71 10 4c 89 74 24 08 49 c1 ee 0c
[  116.498129] systemd-journald[376]: Failed to send WATCHDOG=1 notification message: Connection refused

From objdump:
   a3b0f:   4d 39 dc                cmp    %r11,%r12
   a3b12:   0f 83 10 06 00 00       jae    a4128 <__default_morecore@GLIBC_2.2.5+0x1af8>
   a3b18:   48 39 ce                cmp    %rcx,%rsi
   a3b1b:   0f 84 07 06 00 00       je     a4128 <__default_morecore@GLIBC_2.2.5+0x1af8>
   a3b21:   4c 8b 61 18             mov    0x18(%rcx),%r12
   a3b25:   4a 83 4c 29 08 01       orq    $0x1,0x8(%rcx,%r13,1)
   a3b2b:   4c 39 cb                cmp    %r9,%rbx
   a3b2e:   74 05                   je     a3b35 <__default_morecore@GLIBC_2.2.5+0x1505>
   a3b30:   48 83 49 08 04          orq    $0x4,0x8(%rcx)
   a3b35:   4c 8b 34 24             mov    (%rsp),%r14
   a3b39:   4c 89 62 08             mov    %r12,0x8(%rdx)
   a3b3d:   41 83 c2 01             add    $0x1,%r10d
   a3b41:   49 89 74 24 10          mov    %rsi,0x10(%r12) <<<<<<<< instruction in question
   a3b46:   4c 89 71 18             mov    %r14,0x18(%rcx)
   a3b4a:   4c 8d 71 10             lea    0x10(%rcx),%r14
   a3b4e:   4c 89 74 24 08          mov    %r14,0x8(%rsp)
   a3b53:   49 c1 ee 0c             shr    $0xc,%r14
   a3b57:   4f 33 34 f8             xor    (%r8,%r15,8),%r14
   a3b5b:   4c 89 71 10             mov    %r14,0x10(%rcx)
   a3b5f:   4c 8b 74 24 08          mov    0x8(%rsp),%r14
   a3b64:   4c 89 e1                mov    %r12,%rcx
   a3b67:   4f 89 34 f8             mov    %r14,(%r8,%r15,8)
   a3b6b:   66 45 89 14 78          mov    %r10w,(%r8,%rdi,2)
   a3b70:   eb 99                   jmp    a3b0b <__default_morecore@GLIBC_2.2.5+0x14db>
   a3b72:   66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
[  166.630969] systemd[1]: segfault at 580 ip 000055e0c1ed7577 sp 00007fff1b1631a0 error 4 in systemd[55e0c1e4d000+e0000]
[  166.632507] Code: 01 00 00 48 63 de 83 fb 08 0f 8f e4 01 00 00 83 fb 05 0f 8f 43 02 00 00 49 89 d7 48 85 d2 0f 84 5f 02 00 00 48 8b 02 49 89 fd <44> 8b 98 80 05 00 00 45 85 db 0f 8f c9 00 00 00 41 8b 47 0c 8d 50

From objdump:
   c054a:   0f 84 c8 01 00 00       je     c0718 <log_oom_internal@plt+0x81548>
   c0550:   48 63 de                movslq %esi,%rbx
   c0553:   83 fb 08                cmp    $0x8,%ebx
   c0556:   0f 8f e4 01 00 00       jg     c0740 <log_oom_internal@plt+0x81570>
   c055c:   83 fb 05                cmp    $0x5,%ebx
   c055f:   0f 8f 43 02 00 00       jg     c07a8 <log_oom_internal@plt+0x815d8>
   c0565:   49 89 d7                mov    %rdx,%r15
   c0568:   48 85 d2                test   %rdx,%rdx
   c056b:   0f 84 5f 02 00 00       je     c07d0 <log_oom_internal@plt+0x81600>
   c0571:   48 8b 02                mov    (%rdx),%rax
   c0574:   49 89 fd                mov    %rdi,%r13
   c0577:   44 8b 98 80 05 00 00    mov    0x580(%rax),%r11d    <<<<<<<<<<<<< instr in question
   c057e:   45 85 db                test   %r11d,%r11d
   c0581:   0f 8f c9 00 00 00       jg     c0650 <log_oom_internal@plt+0x81480>
   c0587:   41 8b 47 0c             mov    0xc(%r15),%eax
   c058b:   8d 50 ff                lea    -0x1(%rax),%edx
   c058e:   83 fa 05                cmp    $0x5,%edx
   c0591:   41 0f 96 c6             setbe  %r14b
   c0595:   83 f8 05                cmp    $0x5,%eax
   c0598:   0f 95 c0                setne  %al
   c059b:   41 20 c6                and    %al,%r14b
pfmooney commented 1 year ago

With some rudimentary automation built around repeatedly booting that image under propolis, I've observed systemd-involved segfaults now on both Intel (Ivy Bridge) and AMD (Rome) hardware. As of the time of this writing, the metrics from the tests are as follows:

AMD: 446 tests - 4 segfaults - 24 boot timeouts

uart-1688683832.log:[    4.589254] systemd[1]: segfault at 8 ip 000056360a6802e1 sp 00007ffefe259d50 error 4 in systemd[56360a5f4000+e0000]
uart-1688683904.log:[   10.626955] systemd[1]: segfault at 580 ip 000056481592d577 sp 00007ffcaeeee9c0 error 4 in systemd[5648158a3000+e0000]
uart-1688685336.log:[    7.870240] systemd[1]: segfault at b8 ip 000055d397d33fa7 sp 00007ffd1268fb40 error 4 in systemd[55d397ce8000+e0000]
uart-1688690483.log:[    4.849956] systemd[1]: segfault at 0 ip 00007fe1f4d26089 sp 00007ffd32098e20 error 4 in libsystemd-shared-249.so[7fe1f4c4b000+1a9000]

Intel: 212 tests - 4 segfaults - 52 boot timeouts

uart-1688689213.log:[   25.843751] systemd[1]: segfault at 240 ip 000055c2efdd16b1 sp 00007fffa2b14500 error 4 in systemd[55c2efd86000+e0000]
uart-1688689706.log:[   15.595984] systemd[1]: segfault at 0 ip 00007fc21ec1e089 sp 00007fff79488df0 error 4 in libsystemd-shared-249.so[7fc21eb43000+1a9000]
uart-1688700517.log:[   16.210273] systemd[1]: segfault at 10 ip 00007fe1b6165a9e sp 00007fff58ca01a0 error 4 in libc.so.6[7fe1b60ea000+195000]
uart-1688701468.log:[   12.719193] systemd[1]: segfault at 0 ip 00007fb33f6ac089 sp 00007ffd9b427b80 error 4 in libsystemd-shared-249.so[7fb33f5d1000+1a9000]

The check for boot timeouts is extremely crude: Guests which do not appear to reach the login prompt (in a way the harness recognizes) within 150 seconds of being started are considered "timed out" and are killed.

pfmooney commented 1 year ago

Updates from this weekend: I updated my testing infrastructure to replicate the same conditions with C-bhyve. Running there, after 100s of boot attempts, no hangs or systemd segfaults were observed.

Further testing was done with propolis-standalone, swapping in virtio-block for the boot disk instead of nvme. (The cloudinit disk has always been virtio-block in my test setup). After 100s of boots, no hangs or systemd segfaults occurred. I switched back to nvme for the boot disk and after fewer than 100 boots, there were multiple recorded cases of systemd segfaults and hangs (when the system would fail to reach the login prompt withing 150 seconds).

Following this, I added some additional probes to the nvme logic in hopes of finding something which "looked off". Comparing traces from successful boots to those where segfaults occurred has not revealed any smoking guns as of yet.

pfmooney commented 1 year ago

On suggestion from Robert, I ran my repro setup with the NVMe emulation modified to report a limit of 1 IO queue (really a single pair of SQ/CQ). Doing so would bring it closer in line with how virtio-block behaves, since it has only the single virtqueue for requests. Even with the arbitrary limit in place for nvme (confirmed via dtrace, where IO requests solely bore a qid of 1), the segfaults and hangs were still present.

gjcolombo commented 1 year ago

I ran a 50-reboot loop of an Ubuntu 22.04 image in PHD that hit several segfaults (including one systemd pid-1 segfault) and boot timeouts. Some of the other victims were multipathd and fsck; at least the former failure is correlated with boot timing out. Fortunately, I was able to extract the disk from this VM and so get access to all the juicy core dumps it saved.

The systemd version in my guest is

ubuntu@ubuntu:/$ systemd --version
systemd 249 (249.11-0ubuntu3.9)
+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY -P11KIT -QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified

The systemd core dump shows a failure at the following stack:

(gdb) bt
#0  0x00007f14b33e075b in kill () at ../sysdeps/unix/syscall-template.S:120
#1  0x000055f404852b1c in crash (sig=11) at ../src/core/main.c:257
#2  <signal handler called>
#3  transaction_add_job_and_dependencies (tr=tr@entry=0x55f4064590b0, type=type@entry=JOB_START, unit=0x55f4063c7170, by=by@entry=0x55f4064ddc10, matters=matters@entry=true, conflicts=conflicts@entry=false, ignore_requirements=false, ignore_order=false, e=0x7ffd3ba36420) at ../src/core/transaction.c:924
#4  0x000055f4048d1478 in transaction_add_job_and_dependencies (tr=tr@entry=0x55f4064590b0, type=type@entry=JOB_START, unit=<optimized out>, by=by@entry=0x55f4063f2170, matters=matters@entry=true, conflicts=conflicts@entry=false, ignore_requirements=false, ignore_order=false, e=0x7ffd3ba36420)
    at ../src/core/transaction.c:1005
#5  0x000055f4048d1478 in transaction_add_job_and_dependencies (tr=tr@entry=0x55f4064590b0, type=type@entry=JOB_START, unit=<optimized out>, by=by@entry=0x55f4063e7c70, matters=matters@entry=false, conflicts=conflicts@entry=false, ignore_requirements=false, ignore_order=false, e=0x7ffd3ba36420)
    at ../src/core/transaction.c:1005
#6  0x000055f4048d15a4 in transaction_add_job_and_dependencies (tr=tr@entry=0x55f4064590b0, type=type@entry=JOB_START, unit=<optimized out>, by=by@entry=0x55f4064da340, matters=matters@entry=true, conflicts=conflicts@entry=false, ignore_requirements=false, ignore_order=false, e=0x7ffd3ba36420)
    at ../src/core/transaction.c:1015
#7  0x000055f4048d1478 in transaction_add_job_and_dependencies (tr=0x55f4064590b0, type=<optimized out>, unit=<optimized out>, by=<optimized out>, matters=<optimized out>, conflicts=<optimized out>, ignore_requirements=false, ignore_order=false, e=0x7ffd3ba36420) at ../src/core/transaction.c:1005
#8  0x000055f404896120 in manager_add_job (m=0x55f406341100, type=JOB_START, unit=0x55f40651ee40, mode=JOB_REPLACE, affected_jobs=0x0, error=0x7ffd3ba36420, ret=0x0) at ../src/core/manager.c:1858
#9  0x000055f404867cd2 in signal_activation_request (message=0x55f40653f040, userdata=0x55f406341100, ret_error=<optimized out>) at ../src/core/dbus.c:180
#10 0x00007f14b3846dd4 in bus_match_run (bus=0x55f4064c1a50, node=0x55f4065125b0, m=0x55f40653f040) at ../src/libsystemd/sd-bus/bus-match.c:318
#11 0x00007f14b384698f in bus_match_run (bus=0x55f4064c1a50, node=0x55f406512570, m=0x55f40653f040) at ../src/libsystemd/sd-bus/bus-match.c:429
#12 0x00007f14b384696a in bus_match_run (bus=0x55f4064c1a50, node=0x55f406512530, m=0x55f40653f040) at ../src/libsystemd/sd-bus/bus-match.c:407
#13 0x00007f14b384698f in bus_match_run (bus=0x55f4064c1a50, node=0x55f4065124f0, m=0x55f40653f040) at ../src/libsystemd/sd-bus/bus-match.c:429
#14 0x00007f14b384696a in bus_match_run (bus=0x55f4064c1a50, node=0x55f4065124b0, m=0x55f40653f040) at ../src/libsystemd/sd-bus/bus-match.c:407
#15 0x00007f14b384698f in bus_match_run (bus=0x55f4064c1a50, node=0x55f406512470, m=0x55f40653f040) at ../src/libsystemd/sd-bus/bus-match.c:429
#16 0x00007f14b384696a in bus_match_run (bus=0x55f4064c1a50, node=0x55f40650f4d0, m=0x55f40653f040) at ../src/libsystemd/sd-bus/bus-match.c:407
#17 0x00007f14b384698f in bus_match_run (bus=0x55f4064c1a50, node=0x55f40650f490, m=0x55f40653f040) at ../src/libsystemd/sd-bus/bus-match.c:429
#18 0x00007f14b384696a in bus_match_run (bus=0x55f4064c1a50, node=0x55f40650f450, m=0x55f40653f040) at ../src/libsystemd/sd-bus/bus-match.c:407
#19 0x00007f14b384698f in bus_match_run (bus=0x55f4064c1a50, node=0x55f40650f410, m=0x55f40653f040) at ../src/libsystemd/sd-bus/bus-match.c:429
#20 0x00007f14b3846903 in bus_match_run (bus=0x55f4064c1a50, node=0x55f4064c7210, m=0x55f40653f040) at ../src/libsystemd/sd-bus/bus-match.c:417
#21 0x00007f14b3846e3b in bus_match_run (bus=0x55f4064c1a50, node=0x55f4064c1ac8, m=0x55f40653f040) at ../src/libsystemd/sd-bus/bus-match.c:274
#22 0x00007f14b38660c8 in process_match (m=<optimized out>, bus=<optimized out>) at ../src/libsystemd/sd-bus/sd-bus.c:2840
#23 process_match (bus=bus@entry=0x55f4064c1a50, m=m@entry=0x55f40653f040) at ../src/libsystemd/sd-bus/sd-bus.c:2831
#24 0x00007f14b3869643 in process_message (m=0x55f40653f040, bus=0x55f4064c1a50) at ../src/libsystemd/sd-bus/sd-bus.c:2955
#25 process_running (ret=0x0, bus=0x55f4064c1a50) at ../src/libsystemd/sd-bus/sd-bus.c:3005
#26 bus_process_internal (bus=bus@entry=0x55f4064c1a50, ret=ret@entry=0x0) at ../src/libsystemd/sd-bus/sd-bus.c:3225
#27 0x00007f14b3869b09 in sd_bus_process (bus=bus@entry=0x55f4064c1a50, ret=ret@entry=0x0) at ../src/libsystemd/sd-bus/sd-bus.c:3252
#28 0x00007f14b386a0f1 in io_callback (s=<optimized out>, fd=<optimized out>, revents=<optimized out>, userdata=0x55f4064c1a50) at ../src/libsystemd/sd-bus/sd-bus.c:3603
#29 0x00007f14b38bba10 in source_dispatch (s=s@entry=0x55f40650c110) at ../src/libsystemd/sd-event/sd-event.c:3526
#30 0x00007f14b38bbcfd in dispatch_exit (e=<optimized out>) at ../src/libsystemd/sd-event/sd-event.c:3686
#31 sd_event_dispatch (e=e@entry=0x55f406349f00) at ../src/libsystemd/sd-event/sd-event.c:4103
#32 0x00007f14b38bd2b8 in sd_event_run (e=0x55f406349f00, timeout=18446744073709551615) at ../src/libsystemd/sd-event/sd-event.c:4171
#33 0x000055f404897a9b in manager_loop (m=0x55f406341100) at ../src/core/manager.c:3016
#34 0x000055f404850332 in invoke_main_loop (ret_error_message=0x7ffd3ba36ee0, ret_switch_root_init=<synthetic pointer>, ret_switch_root_dir=<synthetic pointer>, ret_fds=0x7ffd3ba36ed0, ret_shutdown_verb=<synthetic pointer>, ret_retval=<synthetic pointer>, ret_reexecute=<synthetic pointer>, 
    saved_rlimit_memlock=0x7ffd3ba36f10, saved_rlimit_nofile=0x7ffd3ba36f20, m=0x55f406341100) at ../src/core/main.c:1897
#35 main (argc=1, argv=0x7ffd3ba371a8) at ../src/core/main.c:2906

Here is the source file from what I think is the correct version of systemd for the image above: https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/tree/src/core/transaction.c?id=a7ad4a9fc708500c61e3b8127f112d8c90049b2c

The failing line in frame 3 is an attempt to dereference unit->manager, and sure enough we have

(gdb) frame 3
#3  transaction_add_job_and_dependencies (tr=tr@entry=0x55f4064590b0, type=type@entry=JOB_START, unit=0x55f4063c7170, by=by@entry=0x55f4064ddc10, matters=matters@entry=true, conflicts=conflicts@entry=false, ignore_requirements=false, ignore_order=false, e=0x7ffd3ba36420) at ../src/core/transaction.c:924
924     in ../src/core/transaction.c
(gdb) print unit->manager
$2 = (Manager *) 0x0
(gdb) 

In fact the entirety of unit seems to be zeroed:

(gdb) print *unit
$3 = {manager = 0x0, type = UNIT_SERVICE, load_state = UNIT_STUB, merged_into = 0x0, id = 0x0, instance = 0x0, aliases = 0x0, dependencies = 0x0, requires_mounts_for = 0x0, description = 0x0, documentation = 0x0, fragment_path = 0x0, source_path = 0x0, dropin_paths = 0x0, fragment_not_found_timestamp_hash = 0, 
  fragment_mtime = 0, source_mtime = 0, dropin_mtime = 0, transient_file = 0x0, pending_freezer_message = 0x0, freezer_state = FREEZER_RUNNING, job_timeout_action = EMERGENCY_ACTION_NONE, job_timeout = 0, job_running_timeout = 0, job_timeout_reboot_arg = 0x0, job = 0x0, nop_job = 0x0, match_bus_slot = 0x0, 
  get_name_owner_slot = 0x0, bus_track = 0x0, deserialized_refs = 0x0, refs_by_target = 0x0, conditions = 0x0, asserts = 0x0, condition_timestamp = {realtime = 0, monotonic = 0}, assert_timestamp = {realtime = 0, monotonic = 0}, state_change_timestamp = {realtime = 0, monotonic = 0}, inactive_exit_timestamp = {
    realtime = 0, monotonic = 0}, active_enter_timestamp = {realtime = 0, monotonic = 0}, active_exit_timestamp = {realtime = 0, monotonic = 0}, inactive_enter_timestamp = {realtime = 0, monotonic = 0}, units_by_type_next = 0x0, units_by_type_prev = 0x0, load_queue_next = 0x0, load_queue_prev = 0x0, 
  dbus_queue_next = 0x0, dbus_queue_prev = 0x0, cleanup_queue_next = 0x0, cleanup_queue_prev = 0x0, gc_queue_next = 0x0, gc_queue_prev = 0x0, cgroup_realize_queue_next = 0x0, cgroup_realize_queue_prev = 0x0, cgroup_empty_queue_next = 0x0, cgroup_empty_queue_prev = 0x0, cgroup_oom_queue_next = 0x0, 
  cgroup_oom_queue_prev = 0x0, target_deps_queue_next = 0x0, target_deps_queue_prev = 0x0, stop_when_unneeded_queue_next = 0x0, stop_when_unneeded_queue_prev = 0x0, start_when_upheld_queue_next = 0x0, start_when_upheld_queue_prev = 0x0, stop_when_bound_queue_next = 0x0, stop_when_bound_queue_prev = 0x0, 
  pids = 0x0, sigchldgen = 0, notifygen = 0, gc_marker = 0, load_error = 0, start_ratelimit = {interval = 0, burst = 0, num = 0, begin = 0}, start_limit_action = EMERGENCY_ACTION_NONE, markers = 0, success_action = EMERGENCY_ACTION_NONE, failure_action = EMERGENCY_ACTION_NONE, success_action_exit_status = 0, 
  failure_action_exit_status = 0, reboot_arg = 0x0, auto_start_stop_ratelimit = {interval = 0, burst = 0, num = 0, begin = 0}, ref_uid = 0, ref_gid = 0, unit_file_state = UNIT_FILE_ENABLED, unit_file_preset = 0, cpu_usage_base = 0, cpu_usage_last = 0, managed_oom_kill_last = 0, oom_kill_last = 0, 
  io_accounting_base = {0, 0, 0, 0}, io_accounting_last = {0, 0, 0, 0}, cgroup_path = 0x0, cgroup_realized_mask = 0, cgroup_enabled_mask = 0, cgroup_invalidated_mask = 0, cgroup_members_mask = 0, cgroup_control_inotify_wd = 0, cgroup_memory_inotify_wd = 0, bpf_device_control_installed = 0x0, 
  ip_accounting_ingress_map_fd = 0, ip_accounting_egress_map_fd = 0, ip_accounting_extra = {0, 0, 0, 0}, ipv4_allow_map_fd = 0, ipv6_allow_map_fd = 0, ipv4_deny_map_fd = 0, ipv6_deny_map_fd = 0, ip_bpf_ingress = 0x0, ip_bpf_ingress_installed = 0x0, ip_bpf_egress = 0x0, ip_bpf_egress_installed = 0x0, 
  ip_bpf_custom_ingress = 0x0, ip_bpf_custom_ingress_installed = 0x0, ip_bpf_custom_egress = 0x0, ip_bpf_custom_egress_installed = 0x0, bpf_foreign_by_key = 0x0, initial_socket_bind_link_fds = 0x0, rewatch_pids_event_source = 0x0, on_success_job_mode = JOB_FAIL, on_failure_job_mode = JOB_FAIL, 
  collect_mode = COLLECT_INACTIVE, invocation_id = {bytes = '\000' <repeats 15 times>, qwords = {0, 0}}, invocation_id_string = '\000' <repeats 32 times>, stop_when_unneeded = false, default_dependencies = false, refuse_manual_start = false, refuse_manual_stop = false, allow_isolate = false, 
  ignore_on_isolate = false, condition_result = false, assert_result = false, transient = false, perpetual = false, in_load_queue = false, in_dbus_queue = false, in_cleanup_queue = false, in_gc_queue = false, in_cgroup_realize_queue = false, in_cgroup_empty_queue = false, in_cgroup_oom_queue = false, 
  in_target_deps_queue = false, in_stop_when_unneeded_queue = false, in_start_when_upheld_queue = false, in_stop_when_bound_queue = false, sent_dbus_new_signal = false, job_running_timeout_set = false, in_audit = false, on_console = false, cgroup_realized = false, cgroup_members_mask_valid = true, 
  reset_accounting = false, start_limit_hit = false, coldplugged = false, bus_track_add = false, exported_invocation_id = false, exported_log_level_max = false, exported_log_extra_fields = false, exported_log_ratelimit_interval = false, exported_log_ratelimit_burst = false, 
  warned_clamping_cpu_quota_period = false, last_section_private = 0}

I'm disinclined to suspect that this zeroing came from the storage stack, though, for two reasons. One is that there are nonzero bytes on this page, suggesting that it wasn't paged out and then read back as a page of zeroes. The other is that every indication I know of (/etc/fstab/, /proc/swaps, and /proc/meminfo) indicates that this guest has no swap device set up. I think this means there's no way that the process's data could have been paged out and then misread by the NVMe device.


I can confirm that the problem doesn't repro with a virtio disk, so I think continuing to focus on what's different with NVMe makes sense, but I'm not totally convinced this is the storage device causing data corruption vs. something about our environment that happens to trigger a systemd bug only when an NVMe device is present. (Running with a virtual NVMe device, for example, might change the amount of time it takes for certain filesystem startup daemons to run, which might trigger a race in systemd.) I'll think about what I might be able to do to provoke these kinds of races if they exist.

Along those lines, I did a test run with all four of the stock Propolis NVMe USDT probes (read/write enqueue/completion) enabled with a chill action that paused for up to 100us. This definitely slowed down the test, but I don't think it meaningfully changed the test failure rate.

gjcolombo commented 1 year ago

Unrelated to the Ubuntu cores specifically: the other day I also ran an overnight fio test on a virtual NVMe device with verification options enabled. This didn't produce any verification errors.

gjcolombo commented 1 year ago

Another tiny observation: changing between a virtio device and an NVMe device changes some of the device paths that will be referred to in multipathd, which is also a frequent segfaulter (2 of the 3 cores in my last test run).

If I just run sudo multipath repeatedly from an ssh session I see it explode in many strange and wonderful ways (Inconsistency detected by ld.so: dl-close.c: 184: _dl_close_worker: Assertionidx == nloaded' failed!,malloc(): unsorted double linked list corrupted`).

gjcolombo commented 1 year ago

I can repro the boot timeouts after changing my test VM configuration to have a virtio boot disk and a blank NVMe disk (i.e. a setup where we're not reading any of the OS data off the NVMe drive, but the NVMe driver still has to be loaded and any services that care about attached disks have to reckon with the fact that it's there).

pfmooney commented 1 year ago

After several reports of multipathd being on the scene when instances failed their boot, often due to timeouts, I decided to take a closer look. I reset the underlying disk image to ensure that it had not suffered any persistent corruption from prior testing.

With the boot-loop reproducer running, I observed a time-out case where it was the multipath service which failed and prevented forward progress. I booted the instance up manually after that, and went to inspect multipathd itself. Upon running it, I was greeted with a surprise: the linker was unable to load /lib/libmultipath.so.0. Inspection with hexdump showed that the first 0x200 bytes were zero. Using the debugfs tool, I queried the extents/blocks which hold that file and confirmed, while the instance was still running, that the blocks in the underlying zvol were still valid (they still bore the proper ELF header, etc).

I repeated testing like this several times manually, with multipathd working intermittently. Simultaneously, @gjcolombo spun up a test where multipathd was disabled in systemd, which seemed to mitigate the issue at hand.

For full coverage, I instrumented the admin command path (in addition to the probes I had in the nvme data path), and ran multipathd manually during a boot where it was not stricken by a corrupted page in its supporting library. I could see that as part of querying disks during startup, multipathd caused an Identify command to be issued to the admin queue on the root device. Printing the details of that Identify made the problem clear, after reading the emulation code:

ADMIN CMD: RawSubmission {
        cdw0: 807010310,
        nsid: 0,
        rsvd: 0,
        mptr: 0,
        prp1: 4483852544,
        prp2: 4484657152,
        cdw10: 1,
        cdw11: 0,
        cdw12: 0,
        cdw13: 0,
        cdw14: 0,
        cdw15: 0,
}

Notable is that both prp1 and prp2 are non-zero. Converted to hex, the situation is more clear:

prp1: 10b422900
prp2: 10b4e7000

Even though the Identify command is expected to only emit a single page (4KB) of data, because the first PRP is offset into its page (legal per the spec), the rest of the data which would otherwise cross that page boundary must be copied into the second PRP (which then must be page-aligned) instead. If we look at the handling for those copied, the problem becomes apparent: https://github.com/oxidecomputer/propolis/blob/1598c84a7a9d7684ad14ecd4b83b581a3bc182a6/lib/propolis/src/hw/nvme/admin.rs#L211-L215 This code expects a single GuestRegion to be emitted from the PrpIter representing the command output buffer(s). It writes the entire page-long response into that address. Because the guest, in this case, was passing a buffer with an offset, we will overflow by whatever that offset is into the subsequent page.

This logic (and, as pointed out by @luqmana, the logic in GetLogPage) needs to be updated to handle buffers from the PRPs.