Closed srd424 closed 1 week ago
stgraber@dakara:~$ incus launch images:ubuntu/24.04 v1 --vm
Launching v1
stgraber@dakara:~$ incus config device add v1 etc disk source=/etc/ path=/mnt/etc
Device etc added to v1
stgraber@dakara:~$ incus exec v1 -- df -h /mnt/etc
Filesystem Size Used Avail Use% Mounted on
incus_etc 90G 25G 61G 29% /mnt/etc
stgraber@dakara:~$
Can you show a full incus config show --expanded ostree3
?
Also, any chance you can test on an up to date version of Incus (6.0.1 for LTS, 6.3 for non-LTS)?
I'm a bit time/resource constrained at the moment but will have a go. Looking at the code I did wonder if having other virtiofs shares already defined at start up was part of the problem?
On Thu, 8 Aug 2024, 07:10 Stéphane Graber, @.***> wrote:
@.:~$ incus launch images:ubuntu/24.04 v1 --vm Launching v1 @.:~$ incus config device add v1 etc disk source=/etc/ path=/mnt/etc Device etc added to v1 @.:~$ incus exec v1 -- df -h /mnt/etc Filesystem Size Used Avail Use% Mounted on incus_etc 90G 25G 61G 29% /mnt/etc @.:~$
Can you show a full incus config show --expanded ostree3?
Also, any chance you can test on an up to date version of Incus (6.0.1 for LTS, 6.3 for non-LTS)?
— Reply to this email directly, view it on GitHub https://github.com/lxc/incus/issues/1086#issuecomment-2275025926, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4TJUUFUGEZPJQS45DWMALZQMDULAVCNFSM6AAAAABMFEZTAKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZVGAZDKOJSGY . You are receiving this because you authored the thread.Message ID: @.***>
I can't easily upgrade to 6.0.1 - I have some VMs running I really don't want to stop. I will look at spinning up another incus install somewhere though.
For the moment, I did create a fresh VM, started it, added two virtiofs mounts OK. Shut it down and restarted, added one more OK, then the fourth failed Error: Failed to start device "test2": Failed to add the virtiofs device: Bus 'qemu_pcie9' not found
This might be a question of working backwards from the code to work out what the failing condition is..
Here's the PCI topology after the tests described above:
+-01.0-[01]--+-00.0 Red Hat, Inc. Virtio memory balloon [1af4:1045]
| +-00.1 Red Hat, Inc. Virtio RNG [1af4:1044]
| +-00.2 Red Hat, Inc. Virtio input [1af4:1052]
| +-00.3 Red Hat, Inc. Virtio input [1af4:1052]
| +-00.4 Red Hat, Inc. Virtio socket [1af4:1053]
| +-00.5 Red Hat, Inc. Virtio console [1af4:1043]
| \-00.6 Red Hat, Inc. QEMU XHCI Host Controller [1b36:000d]
+-01.1-[02]----00.0 Red Hat, Inc. Virtio SCSI [1af4:1048]
+-01.2-[03]--+-00.0 Red Hat, Inc. Virtio filesystem [1af4:1049]
| +-00.1 Red Hat, Inc. Virtio filesystem [1af4:1049]
| +-00.2 Red Hat, Inc. Virtio file system [1af4:105a]
| +-00.3 Red Hat, Inc. Virtio filesystem [1af4:1049]
| +-00.4 Red Hat, Inc. Virtio file system [1af4:105a]
| \-00.5 Red Hat, Inc. Virtio filesystem [1af4:1049]
+-01.3-[04]----00.0 Red Hat, Inc. Virtio GPU [1af4:1050]
+-01.4-[05]----00.0 Red Hat, Inc. Virtio network device [1af4:1041]
+-01.5-[06]--
+-01.6-[07]----00.0 Red Hat, Inc. Virtio file system [1af4:105a]
+-01.7-[08]--
+-02.0-[09]--
+-1f.0 Intel Corporation 82801IB (ICH9) LPC Interface Controller [8086:2918]
+-1f.2 Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode] [8086:2922]
\-1f.3 Intel Corporation 82801I (ICH9 Family) SMBus Controller [8086:2930]
Can you show a full
incus config show --expanded ostree3
?
Oops, sorry, missed this. I assume you meant for the VM, not the (failed) virtiofs disk device:
architecture: x86_64
config:
limits.cpu: "4"
limits.memory: 8GiB
raw.qemu.conf: |
[machine]
kernel = /var/lib/incus/kernels/vchost-vmlinuz
initrd = /var/lib/incus/kernels/vchost-initrd.img
append = "root=LABEL=root rootflags=subvol=/rootA console=ttyS0 SYSTEMD_FSTAB=/config/fstab systemd.log_level=info systemd.hostname=vchost1 ip=192.168.160.161::192.168.128.1:255.255.128.0:vchost1:lan0:off debug=y"
volatile.cloud-init.instance-id: 95ccb25d-8eaf-495e-b32f-80a28f534d06
volatile.eth0.host_name: tapc34f7557
volatile.eth0.hwaddr: 00:16:3e:f4:99:73
volatile.last_state.power: RUNNING
volatile.last_state.ready: "false"
volatile.uuid: b9d00062-f6cb-4779-b7c5-1da216171475
volatile.uuid.generation: b9d00062-f6cb-4779-b7c5-1da216171475
volatile.vsock_id: "2799179731"
devices:
cachepermpool:
source: /dev/inthdd/cachepermpool
type: disk
cpool:
source: /dev/ssd/vchost1-cpool
type: disk
eth0:
nictype: bridged
parent: balbr0
type: nic
home-data-net:
source: /dev/inthdd/home-data-net
type: disk
iso:
source: /dev/inthdd/iso
type: disk
nixstore:
source: /dev/inthdd/nixstore-img
type: disk
ostree:
readonly: "true"
source: /dev/ssd/ostree
type: disk
pip-cache:
source: /dev/inthdd/pip-cache
type: disk
root:
path: /
pool: vchost
type: disk
rootimg:
readonly: "true"
source: /dev/inthdd/vchost-rootfs
type: disk
sd-dropbox:
source: /dev/ssd/sd-dropbox
type: disk
sharedconf:
path: /vol/sharedconf
source: /vol/clusterconf/shared
type: disk
user-cache:
source: /dev/inthdd/user-cache
type: disk
vcconfig:
path: /media/root-ro/config
source: /vol/clusterconf
type: disk
xu-build:
source: /dev/inthdd/xu-build
type: disk
xu-home:
source: /dev/inthdd/xu-home
type: disk
xu-home-ssd:
source: /dev/inthdd/xu-home-ssd
type: disk
xu-spool:
source: /dev/inthdd/xu-spool
type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""
I can't easily upgrade to 6.0.1 - I have some VMs running I really don't want to stop. I will look at spinning up another incus install somewhere though.
We don't restart workloads on upgrade, only the control plane (API) goes down during the upgrade.
Anyway, it's likely caused by the high-ish number of devices. We have a reserve of I believe 8 hotplug slots but it's supposed to be a sliding thing, basically allowing to hotplug 8 more devices than whatever you started the VM with. The above suggests that this logic may not be behaving as intended and you're running out of slot somehow.
We don't restart workloads on upgrade, only the control plane (API) goes down during the upgrade.
Oh, worth knowing, thanks! I did wonder if that was the case but couldn't quickly turn up the right docs.
Anyway, it's likely caused by the high-ish number of devices. We have a reserve of I believe 8 hotplug slots but it's supposed to be a sliding thing, basically allowing to hotplug 8 more devices than whatever you started the VM with. The above suggests that this logic may not be behaving as intended and you're running out of slot somehow.
My brain isn't working brilliantly at the moment, but poking around in the code I did wonder if it should be trying to add devices to one of the existing buses rather than create a new bus? I also noticed the block devices don't seem to end up in qemu.conf - I assume they're now set up using the qemu monitor when the vm is started? I wondered if the virtiofs stuff could take the same approach - at the moment I think there are different code paths for hotplug vs pre-configured mounts? But I very much did get lost in the code ...
BTW, can confirm this does happen on 6.0.1 too.
My brain isn't working brilliantly at the moment, but poking around in the code I did wonder if it should be trying to add devices to one of the existing buses rather than create a new bus? I also noticed the block devices don't seem to end up in qemu.conf - I assume they're now set up using the qemu monitor when the vm is started? I wondered if the virtiofs stuff could take the same approach - at the moment I think there are different code paths for hotplug vs pre-configured mounts? But I very much did get lost in the code ...
I'll need to look at the logic again, but I thought we made all the disks be hotplug as we want the ability to add/remove them.
The way things are supposed to work is that at startup we allocate PCIe root addresses for all the stuff that we're going to hotplug through QMP. Then we allocate an additional 8 PCIe root addresses to allow for things to be added later on.
We can't alter the PCIe root once the VM is running so given that limited hotplug/hotremove works, the core of the logic seems fine. I suspect we just have an issue where we're somehow not properly pre-allocating some stuff, basically making your boot time disks already use the "spare" slots, at which point you'd have run out of slot and get the error.
So basically a few different things:
Looking into this one now
Starting with an empty VM and attempting to add 10 disks which require PCIe address (using io.bus=nvme to force that), I'm getting:
stgraber@castiana:~$ for i in $(seq -w 10); do incus config device add v1 disk$i disk source=/tmp/test.img io.bus=nvme; done
Device disk01 added to v1
Device disk02 added to v1
Device disk03 added to v1
Error: Failed to start device "disk04": Failed to call monitor hook for block device: Failed adding block device for disk device "disk04": Failed adding device: Bus 'qemu_pcie9' not found
Error: Failed to start device "disk05": Failed to call monitor hook for block device: Failed adding block device for disk device "disk05": Failed adding device: Bus 'qemu_pcie9' not found
Error: Failed to start device "disk06": Failed to call monitor hook for block device: Failed adding block device for disk device "disk06": Failed adding device: Bus 'qemu_pcie9' not found
Error: Failed to start device "disk07": Failed to call monitor hook for block device: Failed adding block device for disk device "disk07": Failed adding device: Bus 'qemu_pcie9' not found
Error: Failed to start device "disk08": Failed to call monitor hook for block device: Failed adding block device for disk device "disk08": Failed adding device: Bus 'qemu_pcie9' not found
Error: Failed to start device "disk09": Failed to call monitor hook for block device: Failed adding block device for disk device "disk09": Failed adding device: Bus 'qemu_pcie9' not found
Error: Failed to start device "disk10": Failed to call monitor hook for block device: Failed adding block device for disk device "disk10": Failed adding device: Bus 'qemu_pcie9' not found
stgraber@castiana:~$ incus restart v1
stgraber@castiana:~$ for i in $(seq -w 10); do incus config device add v1 disk$i disk source=/tmp/test.img io.bus=nvme; done
Error: The device already exists
Error: The device already exists
Error: The device already exists
Device disk04 added to v1
Device disk05 added to v1
Device disk06 added to v1
Error: Failed to start device "disk07": Failed to call monitor hook for block device: Failed adding block device for disk device "disk07": Failed adding device: Bus 'qemu_pcie12' not found
Error: Failed to start device "disk08": Failed to call monitor hook for block device: Failed adding block device for disk device "disk08": Failed adding device: Bus 'qemu_pcie12' not found
Error: Failed to start device "disk09": Failed to call monitor hook for block device: Failed adding block device for disk device "disk09": Failed adding device: Bus 'qemu_pcie12' not found
Error: Failed to start device "disk10": Failed to call monitor hook for block device: Failed adding block device for disk device "disk10": Failed adding device: Bus 'qemu_pcie12' not found
stgraber@castiana:~$ incus restart v1
stgraber@castiana:~$ for i in $(seq -w 10); do incus config device add v1 disk$i disk source=/tmp/test.img io.bus=nvme; done
Error: The device already exists
Error: The device already exists
Error: The device already exists
Error: The device already exists
Error: The device already exists
Error: The device already exists
Device disk07 added to v1
Device disk08 added to v1
Device disk09 added to v1
Error: Failed to start device "disk10": Failed to call monitor hook for block device: Failed adding block device for disk device "disk10": Failed adding device: Bus 'qemu_pcie15' not found
stgraber@castiana:~$ incus restart v1
stgraber@castiana:~$ for i in $(seq -w 10); do incus config device add v1 disk$i disk source=/tmp/test.img io.bus=nvme; done
Error: The device already exists
Error: The device already exists
Error: The device already exists
Error: The device already exists
Error: The device already exists
Error: The device already exists
Error: The device already exists
Error: The device already exists
Error: The device already exists
Device disk10 added to v1
stgraber@castiana:~$
So we can see that we can add at most 3 additional devices before running out of slots.
I'll had to tweak things a bit because the number is supposed to be 4, not 3 and we definitely want a much nicer error when hitting the limit of remaining hotplug slots.
Doing a quick test here after doubling the number of hotplug slots to 8, we can see that there's something off in the logic as the first hotplug slot isn't used:
root@v1:~# lspci -tnnnvvv
-[0000:00]-+-00.0 Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller [8086:29c0]
+-01.0-[01]--+-00.0 Red Hat, Inc. Virtio 1.0 memory balloon [1af4:1045]
| +-00.1 Red Hat, Inc. Virtio 1.0 RNG [1af4:1044]
| +-00.2 Red Hat, Inc. Virtio 1.0 input [1af4:1052]
| +-00.3 Red Hat, Inc. Virtio 1.0 input [1af4:1052]
| +-00.4 Red Hat, Inc. Virtio 1.0 socket [1af4:1053]
| +-00.5 Red Hat, Inc. Virtio 1.0 console [1af4:1043]
| \-00.6 Red Hat, Inc. QEMU XHCI Host Controller [1b36:000d]
+-01.1-[02]----00.0 Red Hat, Inc. Virtio 1.0 SCSI [1af4:1048]
+-01.2-[03]--+-00.0 Red Hat, Inc. Virtio 1.0 filesystem [1af4:1049]
| \-00.1 Red Hat, Inc. Virtio 1.0 filesystem [1af4:1049]
+-01.3-[04]----00.0 Red Hat, Inc. Virtio 1.0 GPU [1af4:1050]
+-01.4-[05]----00.0 Red Hat, Inc. Virtio 1.0 network device [1af4:1041]
+-01.5-[06]--
+-01.6-[07]----00.0 Red Hat, Inc. QEMU NVM Express Controller [1b36:0010]
+-01.7-[08]----00.0 Red Hat, Inc. QEMU NVM Express Controller [1b36:0010]
+-02.0-[09]----00.0 Red Hat, Inc. QEMU NVM Express Controller [1b36:0010]
+-02.1-[0a]----00.0 Red Hat, Inc. QEMU NVM Express Controller [1b36:0010]
+-02.2-[0b]----00.0 Red Hat, Inc. QEMU NVM Express Controller [1b36:0010]
+-02.3-[0c]----00.0 Red Hat, Inc. QEMU NVM Express Controller [1b36:0010]
+-02.4-[0d]----00.0 Red Hat, Inc. QEMU NVM Express Controller [1b36:0010]
+-1f.0 Intel Corporation 82801IB (ICH9) LPC Interface Controller [8086:2918]
+-1f.2 Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode] [8086:2922]
\-1f.3 Intel Corporation 82801I (ICH9 Family) SMBus Controller [8086:2930]
root@v1:~#
Okay, so that's where we get to what you were pointing out earlier, basically the logic is a bit limited in that it basically assumes that every device in the devices list uses a PCIe slot.
It doesn't consider the fact that devices that were present at boot time don't count towards the hotplug quota nor that a number of devices simply don't need a PCIe address at all.
The cleanest option would be to fetch a list of addresses from QEMU directly, I'm going to look at what query-pci
may be able to get us in that regard.
Interesting, we actually do have a mapping for query-pci
already, just not using it anywhere.
Got a reliable way to handle things which is also much simpler than the current logic, win win.
Required information
The output of "incus info" - incus-info.txt
Issue description
virtiofs hotplug seems to fail whenever I try to use it:
I'm wondering if this is a logic error in the code here:
https://github.com/lxc/incus/blob/3ce8af031db73cba39b9af465ebadd0dbc3c7cff/internal/server/instance/drivers/driver_qemu.go#L2311-L2332
I don't fully understand it, but if I look in the raw qemu config for this VM, this existing entries for all the virtiofs/9p devices show the bus as "qemu_pcie2", which makes me think this code should be doing .. something else!