Open SmylerMC opened 8 months ago
Hi @SmylerMC. I think we've got too separate issues here.
Let's take the Debian custom kernel first. The issue here is you also need the virtio_console
module added to /etc/initramfs-tools/modules
too. I've tested with this change and your Dockerfile then works. (Please note RunCVM changed from using serial to using virtio_console
recently. The logic for using serial is still present but disabled pending addition of new command-line option. Until then if you change CONSOLE_MONITOR="0"
to read CONSOLE_MONITOR="1"
in /opt/runcvm/scripts/runcvm-ctr-qemu
RunCVM will use serial instead of virtio_console, and your original Dockerfile will work unchanged).
Now looking at the slow-start issue. This symptom is one I haven't seen and I don't yet know what's going on.
I agree it does look like QEMU taking nearly 2 minutes to load the kernel, but let's be sure.
The option --env=RUNCVM_BREAK
can help us work out exactly where the delay is. Begin with adding --env=RUNCVM_BREAK=preqemu
. This will print the intended QEMU command and then run a shell in the container before launching QEMU. (When you get to the shell, type CTRL-D or type exit
to proceed with QEMU launch).
docker run --rm -it --runtime=runcvm --env=RUNCVM_BIOS_DEBUG=1 --env=RUNCVM_KERNEL_DEBUG=1 --env=RUNCVM_RUNTIME_DEBUG=1 --env=RUNCVM_BREAK=preqemu ubuntu:latest
Please run this and share another screen recording. If the delay is reproducible, whether the delay is before or after the shell will help us isolate the cause.
Here is what I get: https://asciinema.org/a/632558
By the way, how much RAM does your computer have?
Thank you for your swift response.
I had missed the change. I added virtio_console
as you instructed, and I now get a different behavior, where docker returns immediately, with a slow start on computer 1.
Computer 1 recording: https://asciinema.org/a/NzdZdtgL4N93oENyN2DkO15yG Computer 2 recording: https://asciinema.org/a/NH2PK6gYIPzigmVyUsBsA4hxS
New Dockerfile:
FROM debian:bullseye
ARG KERNEL=linux-image-5.10.0-27-amd64
RUN apt update && \
apt install -y "$KERNEL" initramfs-tools && \
echo "virtiofs" >> /etc/initramfs-tools/modules && \
echo "virtio_console" >> /etc/initramfs-tools/modules && \
update-initramfs -u
CONSOLE_MONITOR=1
works fine as a workaround.
Computer 1 has 64GB of RAM. Computer 2 has 16GB.
It is indeed Qemu being very slow to start: https://asciinema.org/a/PSslboorUjZlNgS8NKeOxDmyq .
I notice you've got 20 cores on this computer and are using default memory allocation of 512M.
I wonder if this isn't a happy combination. Let's try --cpus 1
and -m 2g
. If that works better we can iterate on these.
Oh I believe there may be an issue whereby debian specifically doesn't like using virtio_console
with RUNCVM_KERNEL_DEBUG=1
. The aforementioned workaround to use serial instead is needed if you use RUNCVM_KERNEL_DEBUG=1
with debian.
(Bizarrely ubuntu doesn't exhibit this issue, only Debian).
Oh I believe there may be an issue whereby debian specifically doesn't like using
virtio_console
withRUNCVM_KERNEL_DEBUG=1
.
Good catch, removing the option works. Any idea what's causing this?
Let's try
--cpus 1
and-m 2g
.
No luck, doesn't change a thing. The behavior was also the same when running in my Fedora VM, with 8 cores and 8GB. What's weird is that I use Qemu on a daily basis on this computer through libvirt and the cli, and I have never experienced anything like this, except when using RuncVM. One of the options must be causing it. As a last resort, I can try starting the VM manually, removing options until I find the problematic one.
Oh I believe there may be an issue whereby debian specifically doesn't like using
virtio_console
withRUNCVM_KERNEL_DEBUG=1
.Good catch, removing the option works. Any idea what's causing this?
No idea about the Debian/virtio_console issue, though logically I feel it has to be either a kernel build config or initramfs start-up script, as Ubuntu doesn't exhibit the issue. There's an argument to reverting to use serial by default, but I'm reluctant as virtio_console seems the more modern and appropriate interface...
Let's try
--cpus 1
and-m 2g
.No luck, doesn't change a thing. The behavior was also the same when running in my Fedora VM, with 8 cores and 8GB. What's weird is that I use Qemu on a daily basis on this computer through libvirt and the cli, and I have never experienced anything like this, except when using RuncVM. One of the options must be causing it. As a last resort, I can try starting the VM manually, removing options until I find the problematic one.
To be clear, does this slow start only affect vanilla Ubuntu or does it affect any other images? And if you build a custom Ubuntu image with a different kernel, does it affect that?
What does top
show qemu is doing during the slow start?
If you're feeling brave, you could try strace -f
on the qemu process to see what system calls it is running.
I agree that it's likely some qemu command line option could be triggering this, if you normally run qemu without problems. Easy way you can test this is hacking the /opt/runcvm/scripts/runcvm-ctr-qemu script. It should be reasonably clear how it composes the final qemu command line, so you can chop it around.
Another idea: with CONSOLE_MONITOR="1"
in /opt/runcvm/scripts/runcvm-ctr-qemu
, try removing --env=RUNCVM_BIOS_DEBUG=1
and adding --env=RUNCVM_KERNEL_APPEND='earlyprintk=serial'
, i.e. running:
docker run --rm -it --runtime=runcvm --env=RUNCVM_KERNEL_DEBUG=1 --env=RUNCVM_RUNTIME_DEBUG=1 --env=RUNCVM_BREAK=preqemu --env=RUNCVM_KERNEL_APPEND='earlyprintk=serial' ubuntu:latest
This might print some kernel logs earlier, or it might not.
Also with this mode, please copy and paste the kernel logs directly from the terminal rather than use asciinema, as I'm concerned asciinema may be omitting some output.
For comparison, here's the initial output of the above command, run on a Dell R620 running Debian Bullseye:
# docker run --rm -it --runtime=runcvm --env=RUNCVM_KERNEL_DEBUG=1 --env=RUNCVM_RUNTIME_DEBUG=1 --env=RUNCVM_BREAK=preqemu --env=RUNCVM_KERNEL_APPEND='earlyprintk=serial' ubuntu:latest
Preparing to run: '/.runcvm/guest/usr/bin/qemu-system-x86_64' '-no-user-config' '-nodefaults' '-no-reboot' '-action' 'panic=none' '-action' 'reboot=shutdown' '-enable-kvm' '-cpu' 'host,pmu=off' '-machine' 'q35,accel=kvm,usb=off,sata=off' '-device' 'isa-debug-exit' '-nographic' '-vga' 'none' '-fw_cfg' 'opt/org.seabios/etc/sercon-port,string=0' '-m' '512M' '-smp' '48,cores=1,threads=1,sockets=48,maxcpus=48' '-device' 'virtio-serial-pci,id=serial0' '-object' 'rng-random,id=rng0,filename=/dev/urandom' '-device' 'virtio-rng-pci,rng=rng0' '-numa' 'node,memdev=mem' '-object' 'memory-backend-file,id=mem,size=512M,mem-path=/dev/shm,share=on,prealloc=off' '-chardev' 'socket,id=virtiofs,path=/run/.virtiofs.sock' '-device' 'vhost-user-fs-pci,queue-size=1024,chardev=virtiofs,tag=runcvmfs,ats=off' '-netdev' 'tap,id=qemu0,ifname=tap-eth0,script=/.runcvm/guest/scripts/runcvm-ctr-qemu-ifup,downscript=/.runcvm/guest/scripts/runcvm-ctr-qemu-ifdown' '-device' 'virtio-net-pci,netdev=qemu0,mac=52:54:00:14:00:04,rombar=0' '-chardev' 'stdio,id=char0,mux=on,signal=off' '-serial' 'chardev:char0' '-mon' 'chardev=char0' '-echr' '20' '-chardev' 'socket,id=qemuguest0,path=/run/.qemu-guest-agent,server=on,wait=off' '-device' 'virtserialport,chardev=qemuguest0,name=org.qemu.guest_agent.0' '-monitor' 'unix:/run/.qemu-monitor-socket,server,nowait' '-kernel' '/.runcvm/guest/kernels/ubuntu/5.15.0-91-generic/vmlinuz' '-initrd' '/.runcvm/guest/kernels/ubuntu/5.15.0-91-generic/initrd' '-L' '/.runcvm/guest/usr/share/qemu' '-append' 'rootfstype=virtiofs root=runcvmfs noresume nomodeset net.ifnames=1 init=/.runcvm/guest/scripts/runcvm-vm-init rw ipv6.disable=1 panic=-1 scsi_mod.scan=none tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k cryptomgr.notests pci=lastbus=0 selinux=0 systemd.show_status=1 console=ttyS0 earlyprintk=serial'
root@6ead255967f7:/#
exit
[ 0.000000] Linux version 5.15.0-91-generic (buildd@lcy02-amd64-045) (gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #101-Ubuntu SMP Tue Nov 14 13:30:08 UTC 2023 (Ubuntu 5.15.0-91.101-generic 5.15.131)
[ 0.000000] Command line: rootfstype=virtiofs root=runcvmfs noresume nomodeset net.ifnames=1 init=/.runcvm/guest/scripts/runcvm-vm-init rw ipv6.disable=1 panic=-1 scsi_mod.scan=none tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k cryptomgr.notests pci=lastbus=0 selinux=0 systemd.show_status=1 console=ttyS0 earlyprintk=serial
[ 0.000000] KERNEL supported cpus:
[ 0.000000] Intel GenuineIntel
[ 0.000000] AMD AuthenticAMD
[ 0.000000] Hygon HygonGenuine
[ 0.000000] Centaur CentaurHauls
[ 0.000000] zhaoxin Shanghai
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000001ffdefff] usable
[ 0.000000] BIOS-e820: [mem 0x000000001ffdf000-0x000000001fffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000b0000000-0x00000000bfffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
[ 0.000000] printk: bootconsole [earlyser0] enabled
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] SMBIOS 2.8 present.
[ 0.000000] DMI: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-20240107_171655-buildkitsandbox 04/01/2014
[ 0.000000] Hypervisor detected: KVM
[ 0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
[ 0.000000] kvm-clock: cpu 0, msr 17601001, primary cpu clock
[ 0.000006] kvm-clock: using sched offset of 729021333 cycles
[ 0.000653] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[ 0.002394] tsc: Detected 2999.996 MHz processor
[ 0.003323] last_pfn = 0x1ffdf max_arch_pfn = 0x400000000
[ 0.003961] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT
Memory KASLR using RDRAND RDTSC...
[ 0.020016] Using GB pages for direct mapping
[ 0.020772] RAMDISK: [mem 0x1c347000-0x1ffcffff]
[ 0.021496] ACPI: Early table checksum verification disabled
[ 0.022304] ACPI: RSDP 0x00000000000F5C00 000014 (v00 BOCHS )
[ 0.023133] ACPI: RSDT 0x000000001FFE2B4E 00003C (v01 BOCHS BXPC 00000001 BXPC 00000001)
[ 0.024230] ACPI: FACP 0x000000001FFE2426 0000F4 (v03 BOCHS BXPC 00000001 BXPC 00000001)
[ 0.025290] ACPI: DSDT 0x000000001FFDF2C0 003166 (v01 BOCHS BXPC 00000001 BXPC 00000001)
[ 0.026451] ACPI: FACS 0x000000001FFDF280 000040
[ 0.027183] ACPI: APIC 0x000000001FFE251A 0001F0 (v01 BOCHS BXPC 00000001 BXPC 00000001)
[ 0.028514] ACPI: HPET 0x000000001FFE270A 000038 (v01 BOCHS BXPC 00000001 BXPC 00000001)
[ 0.029829] ACPI: SRAT 0x000000001FFE2742 0003A8 (v01 BOCHS BXPC 00000001 BXPC 00000001)
[ 0.030775] ACPI: MCFG 0x000000001FFE2AEA 00003C (v01 BOCHS BXPC 00000001 BXPC 00000001)
[ 0.031884] ACPI: WAET 0x000000001FFE2B26 000028 (v01 BOCHS BXPC 00000001 BXPC 00000001)
[ 0.033035] ACPI: Reserving FACP table memory at [mem 0x1ffe2426-0x1ffe2519]
[ 0.033799] ACPI: Reserving DSDT table memory at [mem 0x1ffdf2c0-0x1ffe2425]
[ 0.034613] ACPI: Reserving FACS table memory at [mem 0x1ffdf280-0x1ffdf2bf]
[ 0.035609] ACPI: Reserving APIC table memory at [mem 0x1ffe251a-0x1ffe2709]
[ 0.036449] ACPI: Reserving HPET table memory at [mem 0x1ffe270a-0x1ffe2741]
[ 0.037324] ACPI: Reserving SRAT table memory at [mem 0x1ffe2742-0x1ffe2ae9]
[ 0.038196] ACPI: Reserving MCFG table memory at [mem 0x1ffe2aea-0x1ffe2b25]
[ 0.039290] ACPI: Reserving WAET table memory at [mem 0x1ffe2b26-0x1ffe2b4d]
[ 0.040310] SRAT: PXM 0 -> APIC 0x00 -> Node 0
[ 0.040959] SRAT: PXM 0 -> APIC 0x01 -> Node 0
[ 0.041489] SRAT: PXM 0 -> APIC 0x02 -> Node 0
[ 0.042072] SRAT: PXM 0 -> APIC 0x03 -> Node 0
[ 0.042745] SRAT: PXM 0 -> APIC 0x04 -> Node 0
[ 0.043452] SRAT: PXM 0 -> APIC 0x05 -> Node 0
[ 0.044112] SRAT: PXM 0 -> APIC 0x06 -> Node 0
[ 0.044818] SRAT: PXM 0 -> APIC 0x07 -> Node 0
[ 0.045441] SRAT: PXM 0 -> APIC 0x08 -> Node 0
[ 0.045913] SRAT: PXM 0 -> APIC 0x09 -> Node 0
[ 0.046384] SRAT: PXM 0 -> APIC 0x0a -> Node 0
[ 0.046901] SRAT: PXM 0 -> APIC 0x0b -> Node 0
[ 0.047477] SRAT: PXM 0 -> APIC 0x0c -> Node 0
[ 0.048004] SRAT: PXM 0 -> APIC 0x0d -> Node 0
[ 0.048573] SRAT: PXM 0 -> APIC 0x0e -> Node 0
[ 0.049250] SRAT: PXM 0 -> APIC 0x0f -> Node 0
[ 0.049868] SRAT: PXM 0 -> APIC 0x10 -> Node 0
[ 0.050469] SRAT: PXM 0 -> APIC 0x11 -> Node 0
[ 0.050947] SRAT: PXM 0 -> APIC 0x12 -> Node 0
[ 0.051424] SRAT: PXM 0 -> APIC 0x13 -> Node 0
[ 0.052006] SRAT: PXM 0 -> APIC 0x14 -> Node 0
[ 0.052605] SRAT: PXM 0 -> APIC 0x15 -> Node 0
[ 0.053098] SRAT: PXM 0 -> APIC 0x16 -> Node 0
[ 0.053707] SRAT: PXM 0 -> APIC 0x17 -> Node 0
[ 0.054309] SRAT: PXM 0 -> APIC 0x18 -> Node 0
[ 0.054895] SRAT: PXM 0 -> APIC 0x19 -> Node 0
[ 0.055560] SRAT: PXM 0 -> APIC 0x1a -> Node 0
[ 0.056159] SRAT: PXM 0 -> APIC 0x1b -> Node 0
[ 0.056811] SRAT: PXM 0 -> APIC 0x1c -> Node 0
[ 0.057411] SRAT: PXM 0 -> APIC 0x1d -> Node 0
[ 0.058067] SRAT: PXM 0 -> APIC 0x1e -> Node 0
[ 0.058719] SRAT: PXM 0 -> APIC 0x1f -> Node 0
[ 0.059217] SRAT: PXM 0 -> APIC 0x20 -> Node 0
[ 0.059772] SRAT: PXM 0 -> APIC 0x21 -> Node 0
[ 0.060362] SRAT: PXM 0 -> APIC 0x22 -> Node 0
[ 0.060993] SRAT: PXM 0 -> APIC 0x23 -> Node 0
[ 0.061575] SRAT: PXM 0 -> APIC 0x24 -> Node 0
[ 0.062185] SRAT: PXM 0 -> APIC 0x25 -> Node 0
[ 0.062808] SRAT: PXM 0 -> APIC 0x26 -> Node 0
[ 0.063437] SRAT: PXM 0 -> APIC 0x27 -> Node 0
[ 0.064094] SRAT: PXM 0 -> APIC 0x28 -> Node 0
[ 0.064707] SRAT: PXM 0 -> APIC 0x29 -> Node 0
[ 0.065348] SRAT: PXM 0 -> APIC 0x2a -> Node 0
[ 0.065989] SRAT: PXM 0 -> APIC 0x2b -> Node 0
[ 0.066617] SRAT: PXM 0 -> APIC 0x2c -> Node 0
[ 0.067264] SRAT: PXM 0 -> APIC 0x2d -> Node 0
[ 0.067896] SRAT: PXM 0 -> APIC 0x2e -> Node 0
[ 0.068582] SRAT: PXM 0 -> APIC 0x2f -> Node 0
[ 0.069233] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
[ 0.070145] ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0x1fffffff]
[ 0.071067] NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000-0x1ffdefff] -> [mem 0x00000000-0x1ffdefff]
[ 0.072529] NODE_DATA(0) allocated [mem 0x1c31d000-0x1c346fff]
[ 0.073947] Zone ranges:
[ 0.074330] DMA [mem 0x0000000000001000-0x0000000000ffffff]
[ 0.075233] DMA32 [mem 0x0000000001000000-0x000000001ffdefff]
[ 0.076120] Normal empty
[ 0.076472] Device empty
[ 0.076871] Movable zone start for each node
[ 0.077494] Early memory node ranges
[ 0.078022] node 0: [mem 0x0000000000001000-0x000000000009efff]
[ 0.078829] node 0: [mem 0x0000000000100000-0x000000001ffdefff]
[ 0.079627] Initmem setup node 0 [mem 0x0000000000001000-0x000000001ffdefff]
[ 0.080689] On node 0, zone DMA: 1 pages in unavailable ranges
[ 0.080912] On node 0, zone DMA: 97 pages in unavailable ranges
[ 0.088638] On node 0, zone DMA32: 33 pages in unavailable ranges
[ 0.089825] ACPI: PM-Timer IO Port: 0x608
[ 0.091135] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
[ 0.091967] IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23
[ 0.092830] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[ 0.093690] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[ 0.094617] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[ 0.095605] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[ 0.096586] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[ 0.097548] ACPI: Using ACPI (MADT) for SMP configuration information
[ 0.098483] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[ 0.099216] TSC deadline timer available
[ 0.099691] smpboot: Allowing 48 CPUs, 0 hotplug CPUs
[ 0.100532] kvm-guest: KVM setup pv remote TLB flush
[ 0.101269] kvm-guest: setup PV sched yield
[ 0.101885] PM: hibernation: Registered nosave memory: [mem 0x00000000-0x00000fff]
[ 0.102907] PM: hibernation: Registered nosave memory: [mem 0x0009f000-0x0009ffff]
[ 0.104000] PM: hibernation: Registered nosave memory: [mem 0x000a0000-0x000effff]
[ 0.105055] PM: hibernation: Registered nosave memory: [mem 0x000f0000-0x000fffff]
[ 0.106125] [mem 0x20000000-0xafffffff] available for PCI devices
[ 0.106950] Booting paravirtualized kernel on KVM
[ 0.107643] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
[ 0.109169] setup_percpu: NR_CPUS:8192 nr_cpumask_bits:48 nr_cpu_ids:48 nr_node_ids:1
[ 0.122131] percpu: Embedded 61 pages/cpu s212992 r8192 d28672 u262144
[ 0.123174] kvm-guest: setup async PF for cpu 0
[ 0.123763] kvm-guest: stealtime: cpu 0, msr 1ae33080
[ 0.124453] kvm-guest: PV spinlocks enabled
[ 0.125067] PV qspinlock hash table entries: 256 (order: 0, 4096 bytes, linear)
[ 0.126139] Built 1 zonelists, mobility grouping on. Total pages: 128735
[ 0.127120] Policy zone: DMA32
[ 0.127574] Kernel command line: rootfstype=virtiofs root=runcvmfs noresume nomodeset net.ifnames=1 init=/.runcvm/guest/scripts/runcvm-vm-init rw ipv6.disable=1 panic=-1 scsi_mod.scan=none tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k cryptomgr.notests pci=lastbus=0 selinux=0 systemd.show_status=1 console=ttyS0 earlyprintk=serial
[ 0.133037] You have booted with nomodeset. This means your GPU drivers are DISABLED
[ 0.134152] Any video related functionality will be severely degraded, and you may not even be able to suspend the system properly
[ 0.135921] Unless you actually understand what nomodeset does, you should reboot without enabling it
[ 0.137405] printk: log_buf_len individual max cpu contribution: 4096 bytes
[ 0.138359] printk: log_buf_len total cpu_extra contributions: 192512 bytes
[ 0.139356] printk: log_buf_len min size: 262144 bytes
[ 0.142180] printk: log_buf_len: 524288 bytes
[ 0.142835] printk: early log buf free: 251768(96%)
[ 0.143975] Dentry cache hash table entries: 65536 (order: 7, 524288 bytes, linear)
[ 0.145324] Inode-cache hash table entries: 32768 (order: 6, 262144 bytes, linear)
[ 0.147705] mem auto-init: stack:off, heap alloc:on, heap free:off
[ 0.150641] Memory: 379888K/523764K available (16393K kernel code, 4393K rwdata, 10868K rodata, 3356K init, 18716K bss, 143616K reserved, 0K cma-reserved)
[ 0.152735] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=48, Nodes=1
[ 0.153502] Kernel/User page tables isolation: enabled
Poking KASLR using RDRAND RDTSC...
[ 0.154513] ftrace: allocating 50637 entries in 198 pages
[ 0.181295] ftrace: allocated 198 pages with 4 groups
[ 0.182578] rcu: Hierarchical RCU implementation.
[ 0.183082] rcu: RCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=48.
[ 0.183803] All grace periods are expedited (rcu_expedited).
[ 0.184450] Rude variant of Tasks RCU enabled.
[ 0.184955] Tracing variant of Tasks RCU enabled.
[ 0.185459] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
[ 0.186254] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=48
[ 0.191724] NR_IRQS: 524544, nr_irqs: 808, preallocated irqs: 16
[ 0.192660] random: crng init done
[ 0.193180] Console: colour *CGA 80x25
[ 0.193617] printk: console [ttyS0] enabled
[ 0.193617] printk: console [ttyS0] enabled
[ 0.194513] printk: bootconsole [earlyser0] disabled
[ 0.194513] printk: bootconsole [earlyser0] disabled
[ 0.195655] ACPI: Core revision 20210730
What does
top
show qemu is doing during the slow start?
It appears to be sitting with a single thread eating 100% of a CPU core's time.
To be clear, does this slow start only affect vanilla Ubuntu or does it affect any other images? And if you build a custom Ubuntu image with a different kernel, does it affect that?
All images I have tried so far are affected, including ones with custom kernels.
This might print some kernel logs earlier, or it might not.
It does not appear to. Early boot messages still appear after about one to two minutes.
I have managed to get qemu to start immediately by removing the network interface in the qemu command line (launching it by hand from the preqemu hook). However, I cannot find-out why it causes an issue. Changing options from the network device doesn't cut the delay, only removing the interface has so far.
The options I removed are the following:
'-netdev' 'tap,id=qemu0,ifname=tap-eth0,script=/.runcvm/guest/scripts/runcvm-ctr-qemu-ifup,downscript=/.runcvm/guest/scripts/runcvm-ctr-qemu-ifdown' \
'-device' 'virtio-net-pci,netdev=qemu0,mac=52:54:00:11:00:03,rombar=0' \
@SmylerMC Thanks so much for persisting with this. At least it seems we're getting somewhere!
Please would you try removing the script
and downscript
options from -netdev
. That will tell us if it's the tap interface/virtio-net-pci device, or the scripts that set it up, that's causing the delay.
If the latter, we can debug the scripts by adding set -x
to the beginning of the script.
If the former, then we could experiment with different -device
drivers.
P.S. To make this change, just edit around line 97 of runcvm-scripts/runcvm-ctr-qemu
:
IFACES+=(
-netdev tap,id=qemu$id,ifname=tap-$DOCKER_IF,script=$QEMU_IFUP,downscript=$QEMU_IFDOWN
-device virtio-net-pci,netdev=qemu$id,mac=$mac,rombar=$id
)
I'm away from the keyboard right now, so I will try again later, but I did try removing the script options earlier and I don't think it changed anything.
As for the device type, I did remove the -device line, which should have made it fallback to a E1000 adapter if I'm not mistaken. That did nothing either.
I think I will try reproducing RuncVM's configuration as closely as I can outside of RunC to see if that could have anything to do with the container's configuration.
You could certainly change virtio-net-pci
to e1000
in the -device
line. I've tested RunCVM/QEMU boots with this change (although as the RunCVM kernel initrds do not contain the e1000 module, guest Linux network is not correctly configured).
Hi @SmylerMC have you had a chance to look at this again? I'm very keen to get to the bottom of it (although at the same time I still haven't been able to reproduce the issue on any test platform).
I'm quite busy at the moment, but I'll probably come back to this in a week or so. I would really like to keep using RuncVM and it is really annoying.
I just realized I did not come back to you about the network scripts and switching the network card to e1000
, but neither did anything.
I am using RuncVM to study kernel vulnerabilities by building vulnerable container images. I am noticing different startup times from one computer to another, often very long, and the container never finishes booting in some cases.
Computer 1
Containers take around 2 minutes to start. Most of that time is spent after the Qemu process is started but seemingly before the kernel starts. The Qemu process uses 100% of a CPU core during that time. A similar waiting time exists when the container exists, also maxing out a CPU core. The exact same behavior is reproduced when running RuncVM in a Fedora VM on that host (Libvirt, with nested virtualization enabled and tested).
CPU:
Intel i7-12700H
Host kernel:6.6.10-1-MANJARO
Docker version:24.0.7, build afdd53b4e3
Asciinema recording: https://asciinema.org/a/iixER5qw6fSiLnslM1NW2z9d1
Computer 2
Running a vanilla Ubuntu container image works flawlessly.
Asciinema recoding: https://asciinema.org/a/wc0nQL5sFQ4hNUXkQMjApGxsq
Trying to run a Debian image with a custom kernel hangs the shell forever. Trying to run a second container with the same image after that returns immediately. This may be a different issue. Qemu is running and doesn't appear to be using much CPU.
Dockerfile:
CPU:
Intel i7-5600U
Host kernel:6.5.0-kali3-amd64
Docker version:20.10.25+dfsg1, build b82b9f3