Open mingww64 opened 1 year ago
Hi @felicia-wen: Could you please help me to see how is this report related to #11? The cited issue talks about 14.0-CURRENT and the use of bhyve+
. Are you using bhyve+
by accident?
Based on the stack trace you included here, this might be a bug in the base system. It might also make sense to take a chance with 13-STABLE. The logs show a system with an AMD CPU which may have not received that much attention as for the Intel CPUs.
The contents of wifibox.log
seems to be scrambled: the timestamps are not linear and there are missing sections.
Hi @felicia-wen: Could you please help me to see how is this report related to #11? The cited issue talks about 14.0-CURRENT and the use of
bhyve+
. Are you usingbhyve+
by accident?
I just upgraded the system to 14.0-CURRENT without bhyve+ installed, according to #11
When I resume my laptop, the system works at first, but hangs after a few seconds. I found that the problem arises when the vmm kernel module is unloaded. I did a ton of experiments and found that vmm will cause a KP when unloaded if there are active ppt devices. The solution is to use devctl detach to detach the ppt0 device before unloading vmm. Please consider adding this to the resume command. This one really got me a lot of headache...
When I run kldunload vmm
after wifibox stop guest
, the system freezes suddenly. It seems to relate to the log, which indicates ppt0
isn't detached properly. I also tried devctl detach pci1:0:0
, but it also returned Device not configured
.
2023-08-11T22:46:35+0800 INFO Destroying guest wifibox
2023-08-11T22:46:39+0800 INFO Destroying bhyve PPT device: pci1:0:0
2023-08-11T22:46:40+0800 DEBUG [devctl] devctl: Failed to clear pci1:0:0 driver: Device not configured
2023-08-11T22:46:40+0800 WARN The PPT device could not be destroyed
The contents of
wifibox.log
seems to be scrambled: the timestamps are not linear and there are missing sections.
Sorry for the inconvenience. This is the new log between a normal boot and a restart after wifibox restart vmm
, on 14.0-CURRENT
2023-08-13T05:47:39+0800 INFO Begin: wifibox start
2023-08-13T05:47:39+0800 DEBUG start=GN
2023-08-13T05:47:39+0800 INFO Creating bridge interface: wifibox0
2023-08-13T05:47:39+0800 DEBUG [ifconfig] wifibox0
2023-08-13T05:47:39+0800 INFO Linking tap interface to wifibox0: tap0
2023-08-13T05:47:39+0800 DEBUG assert loaded: kmod=vmm, kmod_file=/boot/kernel/vmm.ko
2023-08-13T05:47:39+0800 DEBUG Kernel module vmm is not loaded
2023-08-13T05:47:39+0800 DEBUG Kernel module vmm was loaded successfully
2023-08-13T05:47:39+0800 DEBUG assert hardware support present: iommu=1, amdvi=1
2023-08-13T05:47:40+0800 DEBUG Backends reported by bhyve:
2023-08-13T05:47:40+0800 DEBUG [bhyve] ahci
2023-08-13T05:47:40+0800 DEBUG [bhyve] ahci-hd
2023-08-13T05:47:40+0800 DEBUG [bhyve] ahci-cd
2023-08-13T05:47:40+0800 DEBUG [bhyve] e1000
2023-08-13T05:47:40+0800 DEBUG [bhyve] dummy
2023-08-13T05:47:40+0800 DEBUG [bhyve] fbuf
2023-08-13T05:47:40+0800 DEBUG [bhyve] hda
2023-08-13T05:47:40+0800 DEBUG [bhyve] amd_hostbridge
2023-08-13T05:47:40+0800 DEBUG [bhyve] hostbridge
2023-08-13T05:47:40+0800 DEBUG [bhyve] lpc
2023-08-13T05:47:40+0800 DEBUG [bhyve] nvme
2023-08-13T05:47:40+0800 DEBUG [bhyve] passthru
2023-08-13T05:47:40+0800 DEBUG [bhyve] uart
2023-08-13T05:47:40+0800 DEBUG [bhyve] virtio-9p
2023-08-13T05:47:40+0800 DEBUG [bhyve] virtio-blk
2023-08-13T05:47:40+0800 DEBUG [bhyve] virtio-console
2023-08-13T05:47:40+0800 DEBUG [bhyve] virtio-input
2023-08-13T05:47:40+0800 DEBUG [bhyve] virtio-net
2023-08-13T05:47:40+0800 DEBUG [bhyve] virtio-rnd
2023-08-13T05:47:40+0800 DEBUG [bhyve] virtio-scsi
2023-08-13T05:47:40+0800 DEBUG [bhyve] xhci
2023-08-13T05:47:40+0800 INFO Bringing up null-modem devices for console connection
2023-08-13T05:47:40+0800 DEBUG assert loaded: kmod=nmdm, kmod_file=nmdm
2023-08-13T05:47:40+0800 DEBUG Kernel module nmdm is not loaded
2023-08-13T05:47:40+0800 DEBUG Kernel module nmdm was loaded successfully
2023-08-13T05:47:40+0800 INFO Null-modem devices are available
2023-08-13T05:47:40+0800 INFO Waiting for bhyve to start up
2023-08-13T05:47:40+0800 DEBUG Program started as /usr/local/sbin/wifibox, with arguments: _manage_vm
2023-08-13T05:47:40+0800 DEBUG assert daemonized: parent=daemon
2023-08-13T05:47:40+0800 INFO VM manager launched
2023-08-13T05:47:40+0800 INFO Gathering necessary configuration files for launching the guest
2023-08-13T05:47:40+0800 INFO Pulling bhyve options from configuration file
2023-08-13T05:47:40+0800 DEBUG cpus=1, memory=128M, passthru=1/0/0, console=yes
2023-08-13T05:47:40+0800 INFO Guest console is configured to use
2023-08-13T05:47:40+0800 INFO Passthru device is configured: 1/0/0
2023-08-13T05:47:40+0800 INFO PPT driver is configured for pci1:0:0 device
2023-08-13T05:47:40+0800 INFO tap interface is configured: tap0
2023-08-13T05:47:40+0800 INFO Launching guest wifibox from /usr/local/share/wifibox with grub-bhyve
2023-08-13T05:47:40+0800 INFO Application config is found at /usr/local/etc/wifibox/wpa_supplicant
2023-08-13T05:47:40+0800 INFO Application config will be mounted writeable
2023-08-13T05:47:40+0800 DEBUG Devices: virtio-blk,/usr/local/share/wifibox/disk.img virtio-9p,config=/usr/local/etc/wifibox/appliance,ro virtio-9p,var=/var/run/wifibox/appliance virtio-9p,app_config=/usr/local/etc/wifibox/wpa_supplicant
2023-08-13T05:47:40+0800 DEBUG Arguments: -S -M 128M -r host -c /dev/nmdm-wifibox.1A -d /usr/local/share/wifibox wifibox
2023-08-13T05:47:40+0800 INFO Launching guest wifibox from /usr/local/share/wifibox with bhyve
2023-08-13T05:47:40+0800 DEBUG Arguments: -c 1 -m 128M -AHP -u -S -l com1,/dev/nmdm-wifibox.1A -s 0,hostbridge -s 31,lpc -s 4:0,virtio-blk,/usr/local/share/wifibox/disk.img -s 4:1,virtio-9p,config=/usr/local/etc/wifibox/appliance,ro -s 4:2,virtio-9p,var=/var/run/wifibox/appliance -s 4:3,virtio-9p,app_config=/usr/local/etc/wifibox/wpa_supplicant -s 5:0,e1000,tap0 -s 6:0,passthru,1/0/0 wifibox
2023-08-13T05:47:40+0800 DEBUG [bhyve] wrmsr to register 0xc0011029(0x3) on vcpu 0
2023-08-13T05:47:40+0800 DEBUG [bhyve] rdmsr to register 0xc00000e9 on vcpu 0
2023-08-13T05:47:41+0800 INFO Guest wifibox has started up
2023-08-13T05:47:41+0800 INFO Guest is managed by PID 388
2023-08-13T05:47:41+0800 INFO Bringing up Unix Domain Socket pass-through
2023-08-13T05:47:41+0800 INFO Found guest IP address: 10.0.0.1
2023-08-13T05:47:41+0800 INFO Configured sockets: [path=/var/run/wpa_supplicant/wlan0,user=root,group=0,mode=770,port=1200
path=/var/run/wpa_supplicant/p2p-dev-wlan0,user=root,group=0,mode=770,port=1201]
2023-08-13T05:47:41+0800 INFO Hooking up 10.0.0.1:1200 as /var/run/wpa_supplicant/wlan0 (root:0@770)
2023-08-13T05:47:41+0800 INFO Hooking up 10.0.0.1:1201 as /var/run/wpa_supplicant/p2p-dev-wlan0 (root:0@770)
2023-08-13T05:47:41+0800 INFO End: wifibox start
2023-08-12T21:49:24+0800 DEBUG Program started as /usr/local/sbin/wifibox, with arguments: restart vmm
2023-08-12T21:49:24+0800 INFO Begin: wifibox restart
2023-08-12T21:49:24+0800 DEBUG restart=GV
2023-08-12T21:49:24+0800 INFO Tearing down Unix Domain Socket pass-through
2023-08-12T21:49:24+0800 INFO Daemonized socat processes found: [566
575]
2023-08-12T21:49:24+0800 INFO Stopping guest wifibox, managed by PID 388
2023-08-12T21:49:24+0800 INFO Waiting 3 seconds for the guest to stop
2023-08-13T05:50:52+0800 DEBUG Program started as /usr/local/sbin/wifibox, with arguments: start
2023-08-13T05:50:52+0800 INFO Begin: wifibox start
2023-08-13T05:50:52+0800 DEBUG start=GN
2023-08-13T05:50:52+0800 INFO Creating bridge interface: wifibox0
2023-08-13T05:50:52+0800 DEBUG [ifconfig] wifibox0
2023-08-13T05:50:52+0800 INFO Linking tap interface to wifibox0: tap0
2023-08-13T05:50:52+0800 DEBUG assert loaded: kmod=vmm, kmod_file=/boot/kernel/vmm.ko
2023-08-13T05:50:52+0800 DEBUG Kernel module vmm is not loaded
2023-08-13T05:50:53+0800 DEBUG Kernel module vmm was loaded successfully
2023-08-13T05:50:53+0800 DEBUG assert hardware support present: iommu=1, amdvi=1
2023-08-13T05:50:53+0800 DEBUG Backends reported by bhyve:
2023-08-13T05:50:53+0800 DEBUG [bhyve] ahci
2023-08-13T05:50:53+0800 DEBUG [bhyve] ahci-hd
2023-08-13T05:50:53+0800 DEBUG [bhyve] ahci-cd
2023-08-13T05:50:53+0800 DEBUG [bhyve] e1000
2023-08-13T05:50:53+0800 DEBUG [bhyve] dummy
2023-08-13T05:50:53+0800 DEBUG [bhyve] fbuf
2023-08-13T05:50:53+0800 DEBUG [bhyve] hda
2023-08-13T05:50:53+0800 DEBUG [bhyve] amd_hostbridge
2023-08-13T05:50:53+0800 DEBUG [bhyve] hostbridge
2023-08-13T05:50:53+0800 DEBUG [bhyve] lpc
2023-08-13T05:50:53+0800 DEBUG [bhyve] nvme
2023-08-13T05:50:53+0800 DEBUG [bhyve] passthru
2023-08-13T05:50:53+0800 DEBUG [bhyve] uart
2023-08-13T05:50:53+0800 DEBUG [bhyve] virtio-9p
2023-08-13T05:50:53+0800 DEBUG [bhyve] virtio-blk
2023-08-13T05:50:53+0800 DEBUG [bhyve] virtio-console
2023-08-13T05:50:53+0800 DEBUG [bhyve] virtio-input
2023-08-13T05:50:53+0800 DEBUG [bhyve] virtio-net
2023-08-13T05:50:53+0800 DEBUG [bhyve] virtio-rnd
2023-08-13T05:50:53+0800 DEBUG [bhyve] virtio-scsi
2023-08-13T05:50:53+0800 DEBUG [bhyve] xhci
2023-08-13T05:50:53+0800 INFO Bringing up null-modem devices for console connection
2023-08-13T05:50:53+0800 DEBUG assert loaded: kmod=nmdm, kmod_file=nmdm
2023-08-13T05:50:53+0800 DEBUG Kernel module nmdm is not loaded
2023-08-13T05:50:53+0800 DEBUG Kernel module nmdm was loaded successfully
2023-08-13T05:50:53+0800 INFO Null-modem devices are available
2023-08-13T05:50:53+0800 INFO Waiting for bhyve to start up
2023-08-13T05:50:53+0800 DEBUG Program started as /usr/local/sbin/wifibox, with arguments: _manage_vm
2023-08-13T05:50:53+0800 DEBUG assert daemonized: parent=daemon
2023-08-13T05:50:53+0800 INFO VM manager launched
2023-08-13T05:50:53+0800 INFO Gathering necessary configuration files for launching the guest
2023-08-13T05:50:53+0800 INFO Pulling bhyve options from configuration file
2023-08-13T05:50:53+0800 DEBUG cpus=1, memory=128M, passthru=1/0/0, console=yes
2023-08-13T05:50:53+0800 INFO Guest console is configured to use
2023-08-13T05:50:53+0800 INFO Passthru device is configured: 1/0/0
2023-08-13T05:50:53+0800 INFO PPT driver is configured for pci1:0:0 device
2023-08-13T05:50:53+0800 INFO tap interface is configured: tap0
2023-08-13T05:50:53+0800 INFO Launching guest wifibox from /usr/local/share/wifibox with grub-bhyve
2023-08-13T05:50:53+0800 INFO Application config is found at /usr/local/etc/wifibox/wpa_supplicant
2023-08-13T05:50:53+0800 INFO Application config will be mounted writeable
2023-08-13T05:50:53+0800 DEBUG Devices: virtio-blk,/usr/local/share/wifibox/disk.img virtio-9p,config=/usr/local/etc/wifibox/appliance,ro virtio-9p,var=/var/run/wifibox/appliance virtio-9p,app_config=/usr/local/etc/wifibox/wpa_supplicant
2023-08-13T05:50:53+0800 DEBUG Arguments: -S -M 128M -r host -c /dev/nmdm-wifibox.1A -d /usr/local/share/wifibox wifibox
2023-08-13T05:50:53+0800 INFO Launching guest wifibox from /usr/local/share/wifibox with bhyve
2023-08-13T05:50:53+0800 DEBUG Arguments: -c 1 -m 128M -AHP -u -S -l com1,/dev/nmdm-wifibox.1A -s 0,hostbridge -s 31,lpc -s 4:0,virtio-blk,/usr/local/share/wifibox/disk.img -s 4:1,virtio-9p,config=/usr/local/etc/wifibox/appliance,ro -s 4:2,virtio-9p,var=/var/run/wifibox/appliance -s 4:3,virtio-9p,app_config=/usr/local/etc/wifibox/wpa_supplicant -s 5:0,e1000,tap0 -s 6:0,passthru,1/0/0 wifibox
2023-08-13T05:50:53+0800 DEBUG [bhyve] wrmsr to register 0xc0011029(0x3) on vcpu 0
2023-08-13T05:50:53+0800 DEBUG [bhyve] rdmsr to register 0xc00000e9 on vcpu 0
2023-08-13T05:50:54+0800 INFO Guest wifibox has started up
2023-08-13T05:50:54+0800 INFO Guest is managed by PID 386
2023-08-13T05:50:54+0800 INFO Bringing up Unix Domain Socket pass-through
2023-08-13T05:50:54+0800 INFO Found guest IP address: 10.0.0.1
2023-08-13T05:50:54+0800 INFO Configured sockets: [path=/var/run/wpa_supplicant/wlan0,user=root,group=0,mode=770,port=1200
path=/var/run/wpa_supplicant/p2p-dev-wlan0,user=root,group=0,mode=770,port=1201]
2023-08-13T05:50:54+0800 INFO Hooking up 10.0.0.1:1200 as /var/run/wpa_supplicant/wlan0 (root:0@770)
2023-08-13T05:50:54+0800 INFO Hooking up 10.0.0.1:1201 as /var/run/wpa_supplicant/p2p-dev-wlan0 (root:0@770)
2023-08-13T05:50:54+0800 INFO End: wifibox start
I just upgraded the system to 14.0-CURRENT without bhyve+ installed, according to #11
I do not think that #11 is relevant in this case. As you can see in the last comment, that issue was resolved by switching to the bhyve
instance provided by the FreeBSD base system. All the problems that the user reported were originated from the way how to custom-built bhyve
version (i.e. bhyve+
, which included the vmm
kernel module) interacted with the rest of the base system.
When I run
kldunload vmm
afterwifibox stop guest
, the system freezes suddenly. It seems to relate to the log, which indicatesppt0
isn't detached properly. I also trieddevctl detach pci1:0:0
, but it also returnedDevice not configured
.
Wifibox attempts to clear the ppt
driver on stop
but there is no guarantee for the success. I am still inclined to think that it is a problem with the FreeBSD base system.
I'm seeing similar behavior on my system. I would believe that indeed, it is either a problem with the FreeBSD base system or with the wifi device itself (BCM43602 in my case).
I'm also seeing that "The PPT device could not be destroyed" warning. But at least I can explain that. Take for example the attaching,detatching,clearing, of the driver:
user@evilco:~$ sudo devctl set driver -f pci0:3:0:0 ppt
ppt0 mem 0xa0800000-0xa0807fff,0xa0400000-0xa07fffff at device 0.0 on pci3
user@evilco:~$ sudo pciconf -a pci0:3:0:0
pci0:3:0:0: attached
user@evilco:~$ sudo devctl clear driver pci0:3:0:0
devctl: Failed to clear pci0:3:0:0 driver: Device busy
user@evilco:~$ sudo devctl detach -f pci0:3:0:0
ppt0: detached
user@evilco:~$ sudo pciconf -a pci0:3:0:0
pci0:3:0:0: not attached
user@evilco:~$ pciconf -lv | grep ppt
ppt0@pci0:3:0:0: class=0x028000 rev=0x01 hdr=0x00 vendor=0x14e4 device=0x43ba subvendor=0x106b subdevice=0x0152
user@evilco:~$ sudo devctl clear driver pci0:3:0:0
pci3: <network> at device 0.0 (no driver attached)
devctl: Failed to clear pci0:3:0:0 driver: Device not configured
user@evilco:~$ echo $?
1
user@evilco:~$ pciconf -lv | grep ppt
none0@pci0:3:0:0: class=0x028000 rev=0x01 hdr=0x00 vendor=0x14e4 device=0x43ba subvendor=0x106b subdevice=0x0152
user@evilco:~/dev/freebsd-wifibox-port$ sudo devctl clear driver pci0:3:0:0
user@evilco:~/dev/freebsd-wifibox-port$ echo $?
0
The message devctl: Failed to clear pci0:3:0:0 driver: Device not configured
is misleading: the device is cleared. wifibox
could attempt to clear the driver twice here, but I'm not sure if it's really wifibox's fault that some devices are broken.
@mingww64 and @pgj: The issue is either in how bhyve destroys the VM, or in how FreeBSD's pci kernel code handles destruction. Possibly also the firmware itself (highly likely).
Anyways, You can successfully suspend if you connect to the VM work out the in-vm pci number (lspci
)) and run:
echo 1 | tee /sys/bus/pci/devices/0000\:00\:06.0/0000\:03\:00.0/remove
Where 06 is the pci bus number.
Then you can stop the wifibox service and unload vmm successfully.
Three weeks of debugging FreeBSD's kernel and all it took was a one-liner😬
I'll open a proper bug report soon, and I would imagine a proper bug report to FreeBSD-src soon. I would imagine a solution for wifibox would be to include an option to automatically "remove" the device from the VM when wifibox is stopped. That way, the sequence can be suspend->remove from vm->stop wifibox->unload vmm->resume->start wifibox.
I've written-up the issue here: https://joshua.hu/brcmfmac-bcm43602-suspension-shutdown-hanging-freeze-linux-freebsd-wifi-bug-pci-passthru
The tl;dr is that the upon shutdown of the VM, the pcie device is not properly reset, leaving it in an inconsistent state. I will submit a patch to Linux with the very big patch:
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c
@@ -2711,6 +2711,7 @@ static struct pci_driver brcmf_pciedrvr = {
.id_table = brcmf_pcie_devid_table,
.probe = brcmf_pcie_probe,
.remove = brcmf_pcie_remove,
+ .shutdown = brcmf_pcie_remove,
#ifdef CONFIG_PM
.driver.pm = &brcmf_pciedrvr_pm,
#endif
Nice work @MegaManSec, thank you for tracking down this issue!
Since we can control all the packages for Wifibox/Alpine, the guest image that is shipped for Wifibox by default, I could include your patch to the Linux kernel until it is merged upstream. Do I understand it right that if this is done, we would not need to do anything else on the Wifibox (script) side?
Yes, the patch should be sufficient in of itself.
The only issue I can see is if the "kill -SIGTERM" to the VM does not complete before the VM is forcefully stopped. If we want extra protection, when the VM is stopped, we could somehow signal to the VM to directly "remove" the pci device from inside the Linux VM (rather than it be part of the shutdown sequence).
Version 20240823
of net/wifibox-alpine
contains both fixes for the problem: Joshua's kernel patch and a remove-wireless
OpenRC script that is run on shutdown and it implements the logic described above. I tried to make that script fit that it is going to be run relatively early on the SIGTERM
-induced ACPI power-off event. At the moment, I do not have a better (cleaner) idea to notify the VM about the shutdown.
You can test this version by checking out the chore/net/wifibox-alpine/20240823
branch:
https://github.com/pgj/freebsd-wifibox-port/tree/chore/net/wifibox-alpine/20240823
It works with some changes. From bhyve manpage: SIGTERM Trigger ACPI poweroff for a VM
However, vm_stop() kills the daemon
PID with sigterm. If we look at pstree, we can see what really happens:
|-+= 77437 root daemon: wifibox-vm-manager[77584] (daemon)
| \-+- 77584 root /bin/sh /usr/local/sbin/wifibox _manage_vm
| |--- 98958 root bhyve: wifibox (bhyve)
| \--- 99018 root /bin/sh /usr/local/sbin/wifibox _manage_vm
vm_stop kills daemon
, which forwards the signal to 77584 root /bin/sh /usr/local/sbin/wifibox _manage_vm
.
But the shell script does not forward the signal to bhyve.
And the only case I can tell where the fix does not work, is if the VM is killed while it is starting up -- it seems that Linux will not perform an ACPI poweroff while it is starting up. I assume there's nothing we can do about that, though.
Yes, that is an acknowledged risk.
Description
As described in https://github.com/pgj/freebsd-wifibox/issues/11#issue-1118701979, the ppt device must be detached before unloading
vmm
. However, more is needed.Steps to reproduce:
Suspend and resume
or
wifibox restart vmm
or
wifibox stop guest
klmunload vmm
Host operating system
Wireless NIC
Wifibox version
Disk image type and version
The kind of VM image in use, e.g. Wifibox/Alpine, and its version.
Changes to the default configuration files
Logs
Additional context
Crash dump
Have you tried to turn it on and off?