pgj / freebsd-wifibox

wifibox: Use Linux to drive your wireless card on FreeBSD
BSD 2-Clause "Simplified" License
159 stars 12 forks source link

Kernel Panic after resuming from suspension #60

Open mingww64 opened 1 year ago

mingww64 commented 1 year ago

Description

As described in https://github.com/pgj/freebsd-wifibox/issues/11#issue-1118701979, the ppt device must be detached before unloading vmm. However, more is needed.

2023-08-11T22:46:39+0800 INFO  Destroying bhyve PPT device: pci1:0:0
2023-08-11T22:46:40+0800 DEBUG [devctl] devctl: Failed to clear pci1:0:0 driver: Device not configured
2023-08-11T22:46:40+0800 WARN  The PPT device could not be destroyed

Steps to reproduce:

Suspend and resume

or

wifibox restart vmm

or

wifibox stop guest klmunload vmm

Host operating system

FreeBSD laptop 13.2-RELEASE-p2 FreeBSD 13.2-RELEASE-p2 GENERIC amd64
Also reproducible on 13.1

Wireless NIC

ppt0@pci0:1:0:0:        class=0x028000 rev=0x1a hdr=0x00 vendor=0x8086 device=0x2723 subvendor=0x8086 subdevice=0x0084
    vendor     = 'Intel Corporation'
    device     = 'Wi-Fi 6 AX200'
    class      = network

Wifibox version

wifibox version 0.11.0

Disk image type and version

The kind of VM image in use, e.g. Wifibox/Alpine, and its version.

Changes to the default configuration files

Include relevant sections from all the configuration files that contain local changes and may help to reproduce the problem.

Logs

Please copy and paste any relevant log output from the following sources:

- /var/log/wifibox.log (with DEBUG logging verbosity)

2023-08-11T22:49:45+0800 DEBUG Program started as /usr/local/sbin/wifibox, with arguments: start
2023-08-11T22:49:45+0800 INFO  Begin: wifibox start
2023-08-11T22:49:45+0800 DEBUG start=GN
2023-08-11T22:49:45+0800 INFO  Creating bridge interface: wifibox0
2023-08-11T22:49:45+0800 DEBUG [ifconfig] wifibox0
2023-08-11T22:49:45+0800 INFO  Linking tap interface to wifibox0: tap0
2023-08-11T22:49:45+0800 DEBUG assert loaded: kmod=vmm, kmod_file=/boot/kernel/vmm.ko
2023-08-11T22:49:45+0800 DEBUG Kernel module vmm is not loaded
2023-08-11T22:49:45+0800 DEBUG Kernel module vmm was loaded successfully
2023-08-11T22:49:45+0800 DEBUG assert hardware support present: iommu=1, amdvi=1
2023-08-11T22:49:45+0800 DEBUG Backends reported by bhyve:
2023-08-11T22:49:45+0800 DEBUG [bhyve] ahci
2023-08-11T22:49:45+0800 DEBUG [bhyve] ahci-hd
2023-08-11T22:49:45+0800 DEBUG [bhyve] ahci-cd
2023-08-11T22:49:45+0800 DEBUG [bhyve] e1000
2023-08-11T22:49:45+0800 DEBUG [bhyve] dummy
app_config=/usr/local/etc/wifibox/wpa_supplicant
2023-08-11T22:31:12+0800 DEBUG Arguments:  -S -M 128M -r host -c /dev/nmdm-wifibox.1A -d /usr/local/share/wifibox wifibox
2023-08-11T22:31:12+0800 INFO  Launching guest wifibox from /usr/local/share/wifibox with bhyve
2023-08-11T22:31:12+0800 DEBUG Arguments:  -c 1 -m 128M -AHP -u -S -l com1,/dev/nmdm-wifibox.1A -s 0,hostbridge -s 31,lpc -s 4:0,virtio-blk,/usr/local/share/wifibox/disk.img -s 4:1,virtio-9
p,config=/usr/local/etc/wifibox/appliance,ro -s 4:2,virtio-9p,var=/var/run/wifibox/appliance -s 4:3,virtio-9p,app_config=/usr/local/etc/wifibox/wpa_supplicant -s 5:0,e1000,tap0 -s 6:0,passt
hru,1/0/0 wifibox
2023-08-11T22:31:15+0800 DEBUG [bhyve] wrmsr to register 0xc0011029(0x3) on vcpu 0
2023-08-11T22:31:15+0800 DEBUG [bhyve] rdmsr to register 0xc00000e9 on vcpu 0
2023-08-11T22:46:32+0800 DEBUG Program started as /usr/local/sbin/wifibox, with arguments: stop guest
2023-08-11T22:46:32+0800 INFO  Begin: wifibox stop
2023-08-11T22:46:32+0800 DEBUG stop=G
2023-08-11T22:46:32+0800 INFO  Tearing down Unix Domain Socket pass-through
2023-08-11T22:46:32+0800 INFO  Daemonized socat processes found: [10509
10518]
2023-08-11T22:46:32+0800 INFO  Stopping guest wifibox, managed by PID 10416
2023-08-11T22:46:32+0800 INFO  Waiting 3 seconds for the guest to stop
2023-08-11T22:46:35+0800 INFO  Forcing shutdown of guest wifibox
2023-08-11T22:46:35+0800 INFO  Destroying guest wifibox
2023-08-11T22:46:39+0800 INFO  Destroying bhyve PPT device: pci1:0:0
2023-08-11T22:31:12+0800 INFO  Launching guest wifibox from /usr/local/share/wifibox with bhyve
2023-08-11T22:31:12+0800 DEBUG Arguments:  -c 1 -m 128M -AHP -u -S -l com1,/dev/nmdm-wifibox.1A -s 0,hostbridge -s 31,lpc -s 4:0,virtio-blk,/usr/local/share/wifibox/disk.img -s 4:1,virtio-9
p,config=/usr/local/etc/wifibox/appliance,ro -s 4:2,virtio-9p,var=/var/run/wifibox/appliance -s 4:3,virtio-9p,app_config=/usr/local/etc/wifibox/wpa_supplicant -s 5:0,e1000,tap0 -s 6:0,passt
hru,1/0/0 wifibox
2023-08-11T22:31:15+0800 DEBUG [bhyve] wrmsr to register 0xc0011029(0x3) on vcpu 0
2023-08-11T22:31:15+0800 DEBUG [bhyve] rdmsr to register 0xc00000e9 on vcpu 0
2023-08-11T22:46:32+0800 DEBUG Program started as /usr/local/sbin/wifibox, with arguments: stop guest
2023-08-11T22:46:32+0800 INFO  Begin: wifibox stop
2023-08-11T22:46:32+0800 DEBUG stop=G
2023-08-11T22:46:32+0800 INFO  Tearing down Unix Domain Socket pass-through
2023-08-11T22:46:32+0800 INFO  Daemonized socat processes found: [10509
10518]
2023-08-11T22:46:32+0800 INFO  Stopping guest wifibox, managed by PID 10416
2023-08-11T22:46:32+0800 INFO  Waiting 3 seconds for the guest to stop
2023-08-11T22:46:35+0800 INFO  Forcing shutdown of guest wifibox
2023-08-11T22:46:35+0800 INFO  Destroying guest wifibox
2023-08-11T22:46:39+0800 INFO  Destroying bhyve PPT device: pci1:0:0
2023-08-11T22:46:40+0800 DEBUG [devctl] devctl: Failed to clear pci1:0:0 driver: Device not configured
2023-08-11T22:46:40+0800 WARN  The PPT device could not be destroyed
2023-08-11T22:46:40+0800 INFO  Removing null-modem devices
2023-08-11T22:46:40+0800 INFO  Null-modem devices are removed
2023-08-11T22:46:40+0800 INFO  End: wifibox stop

Additional context

Crash dump

Unread portion of the kernel message buffer:
panic: page fault
cpuid = 2
time = 1691765364
KDB: stack backtrace:
#0 0xffffffff80c53d95 at kdb_backtrace+0x65
#1 0xffffffff80c06711 at vpanic+0x151
#2 0xffffffff80c065b3 at panic+0x43
#3 0xffffffff810b1fa7 at trap_fatal+0x387
#4 0xffffffff810b1fff at trap_pfault+0x4f
#5 0xffffffff81088e28 at calltrap+0x8
#6 0xffffffff83a0b1d7 at iommu_cleanup+0x87
#7 0xffffffff83a0462c at vmm_handler+0xac
#8 0xffffffff80be2698 at module_unload+0x38
#9 0xffffffff80bd3bea at linker_file_unload+0x1ea
#10 0xffffffff80bd530d at kern_kldunload+0x16d
#11 0xffffffff810b289c at amd64_syscall+0x10c
#12 0xffffffff8108973b at fast_syscall_common+0xf8
Uptime: 1h3m58s
Dumping 1607 out of 15749 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55      /usr/src/sys/amd64/include/pcpu_aux.h: No such file or directory.

Have you tried to turn it on and off?

pgj commented 1 year ago

Hi @felicia-wen: Could you please help me to see how is this report related to #11? The cited issue talks about 14.0-CURRENT and the use of bhyve+. Are you using bhyve+ by accident?

Based on the stack trace you included here, this might be a bug in the base system. It might also make sense to take a chance with 13-STABLE. The logs show a system with an AMD CPU which may have not received that much attention as for the Intel CPUs.

pgj commented 1 year ago

The contents of wifibox.log seems to be scrambled: the timestamps are not linear and there are missing sections.

mingww64 commented 1 year ago

Hi @felicia-wen: Could you please help me to see how is this report related to #11? The cited issue talks about 14.0-CURRENT and the use of bhyve+. Are you using bhyve+ by accident?

I just upgraded the system to 14.0-CURRENT without bhyve+ installed, according to #11

When I resume my laptop, the system works at first, but hangs after a few seconds. I found that the problem arises when the vmm kernel module is unloaded. I did a ton of experiments and found that vmm will cause a KP when unloaded if there are active ppt devices. The solution is to use devctl detach to detach the ppt0 device before unloading vmm. Please consider adding this to the resume command. This one really got me a lot of headache...

When I run kldunload vmm after wifibox stop guest, the system freezes suddenly. It seems to relate to the log, which indicates ppt0 isn't detached properly. I also tried devctl detach pci1:0:0, but it also returned Device not configured.

2023-08-11T22:46:35+0800 INFO  Destroying guest wifibox
2023-08-11T22:46:39+0800 INFO  Destroying bhyve PPT device: pci1:0:0
2023-08-11T22:46:40+0800 DEBUG [devctl] devctl: Failed to clear pci1:0:0 driver: Device not configured
2023-08-11T22:46:40+0800 WARN  The PPT device could not be destroyed
mingww64 commented 1 year ago

The contents of wifibox.log seems to be scrambled: the timestamps are not linear and there are missing sections.

Sorry for the inconvenience. This is the new log between a normal boot and a restart after wifibox restart vmm, on 14.0-CURRENT

2023-08-13T05:47:39+0800 INFO  Begin: wifibox start
2023-08-13T05:47:39+0800 DEBUG start=GN
2023-08-13T05:47:39+0800 INFO  Creating bridge interface: wifibox0
2023-08-13T05:47:39+0800 DEBUG [ifconfig] wifibox0
2023-08-13T05:47:39+0800 INFO  Linking tap interface to wifibox0: tap0
2023-08-13T05:47:39+0800 DEBUG assert loaded: kmod=vmm, kmod_file=/boot/kernel/vmm.ko
2023-08-13T05:47:39+0800 DEBUG Kernel module vmm is not loaded
2023-08-13T05:47:39+0800 DEBUG Kernel module vmm was loaded successfully
2023-08-13T05:47:39+0800 DEBUG assert hardware support present: iommu=1, amdvi=1
2023-08-13T05:47:40+0800 DEBUG Backends reported by bhyve:
2023-08-13T05:47:40+0800 DEBUG [bhyve] ahci
2023-08-13T05:47:40+0800 DEBUG [bhyve] ahci-hd
2023-08-13T05:47:40+0800 DEBUG [bhyve] ahci-cd
2023-08-13T05:47:40+0800 DEBUG [bhyve] e1000
2023-08-13T05:47:40+0800 DEBUG [bhyve] dummy
2023-08-13T05:47:40+0800 DEBUG [bhyve] fbuf
2023-08-13T05:47:40+0800 DEBUG [bhyve] hda
2023-08-13T05:47:40+0800 DEBUG [bhyve] amd_hostbridge
2023-08-13T05:47:40+0800 DEBUG [bhyve] hostbridge
2023-08-13T05:47:40+0800 DEBUG [bhyve] lpc
2023-08-13T05:47:40+0800 DEBUG [bhyve] nvme
2023-08-13T05:47:40+0800 DEBUG [bhyve] passthru
2023-08-13T05:47:40+0800 DEBUG [bhyve] uart
2023-08-13T05:47:40+0800 DEBUG [bhyve] virtio-9p
2023-08-13T05:47:40+0800 DEBUG [bhyve] virtio-blk
2023-08-13T05:47:40+0800 DEBUG [bhyve] virtio-console
2023-08-13T05:47:40+0800 DEBUG [bhyve] virtio-input
2023-08-13T05:47:40+0800 DEBUG [bhyve] virtio-net
2023-08-13T05:47:40+0800 DEBUG [bhyve] virtio-rnd
2023-08-13T05:47:40+0800 DEBUG [bhyve] virtio-scsi
2023-08-13T05:47:40+0800 DEBUG [bhyve] xhci
2023-08-13T05:47:40+0800 INFO  Bringing up null-modem devices for console connection
2023-08-13T05:47:40+0800 DEBUG assert loaded: kmod=nmdm, kmod_file=nmdm
2023-08-13T05:47:40+0800 DEBUG Kernel module nmdm is not loaded
2023-08-13T05:47:40+0800 DEBUG Kernel module nmdm was loaded successfully
2023-08-13T05:47:40+0800 INFO  Null-modem devices are available
2023-08-13T05:47:40+0800 INFO  Waiting for bhyve to start up
2023-08-13T05:47:40+0800 DEBUG Program started as /usr/local/sbin/wifibox, with arguments: _manage_vm
2023-08-13T05:47:40+0800 DEBUG assert daemonized: parent=daemon
2023-08-13T05:47:40+0800 INFO  VM manager launched
2023-08-13T05:47:40+0800 INFO  Gathering necessary configuration files for launching the guest
2023-08-13T05:47:40+0800 INFO  Pulling bhyve options from configuration file
2023-08-13T05:47:40+0800 DEBUG cpus=1, memory=128M, passthru=1/0/0, console=yes
2023-08-13T05:47:40+0800 INFO  Guest console is configured to use
2023-08-13T05:47:40+0800 INFO  Passthru device is configured: 1/0/0
2023-08-13T05:47:40+0800 INFO  PPT driver is configured for pci1:0:0 device
2023-08-13T05:47:40+0800 INFO  tap interface is configured: tap0
2023-08-13T05:47:40+0800 INFO  Launching guest wifibox from /usr/local/share/wifibox with grub-bhyve
2023-08-13T05:47:40+0800 INFO  Application config is found at /usr/local/etc/wifibox/wpa_supplicant
2023-08-13T05:47:40+0800 INFO  Application config will be mounted writeable
2023-08-13T05:47:40+0800 DEBUG Devices: virtio-blk,/usr/local/share/wifibox/disk.img virtio-9p,config=/usr/local/etc/wifibox/appliance,ro virtio-9p,var=/var/run/wifibox/appliance virtio-9p,app_config=/usr/local/etc/wifibox/wpa_supplicant
2023-08-13T05:47:40+0800 DEBUG Arguments:  -S -M 128M -r host -c /dev/nmdm-wifibox.1A -d /usr/local/share/wifibox wifibox
2023-08-13T05:47:40+0800 INFO  Launching guest wifibox from /usr/local/share/wifibox with bhyve
2023-08-13T05:47:40+0800 DEBUG Arguments:  -c 1 -m 128M -AHP -u -S -l com1,/dev/nmdm-wifibox.1A -s 0,hostbridge -s 31,lpc -s 4:0,virtio-blk,/usr/local/share/wifibox/disk.img -s 4:1,virtio-9p,config=/usr/local/etc/wifibox/appliance,ro -s 4:2,virtio-9p,var=/var/run/wifibox/appliance -s 4:3,virtio-9p,app_config=/usr/local/etc/wifibox/wpa_supplicant -s 5:0,e1000,tap0 -s 6:0,passthru,1/0/0 wifibox
2023-08-13T05:47:40+0800 DEBUG [bhyve] wrmsr to register 0xc0011029(0x3) on vcpu 0
2023-08-13T05:47:40+0800 DEBUG [bhyve] rdmsr to register 0xc00000e9 on vcpu 0
2023-08-13T05:47:41+0800 INFO  Guest wifibox has started up
2023-08-13T05:47:41+0800 INFO  Guest is managed by PID 388
2023-08-13T05:47:41+0800 INFO  Bringing up Unix Domain Socket pass-through
2023-08-13T05:47:41+0800 INFO  Found guest IP address: 10.0.0.1
2023-08-13T05:47:41+0800 INFO  Configured sockets: [path=/var/run/wpa_supplicant/wlan0,user=root,group=0,mode=770,port=1200
path=/var/run/wpa_supplicant/p2p-dev-wlan0,user=root,group=0,mode=770,port=1201]
2023-08-13T05:47:41+0800 INFO  Hooking up 10.0.0.1:1200 as /var/run/wpa_supplicant/wlan0 (root:0@770)
2023-08-13T05:47:41+0800 INFO  Hooking up 10.0.0.1:1201 as /var/run/wpa_supplicant/p2p-dev-wlan0 (root:0@770)
2023-08-13T05:47:41+0800 INFO  End: wifibox start
2023-08-12T21:49:24+0800 DEBUG Program started as /usr/local/sbin/wifibox, with arguments: restart vmm
2023-08-12T21:49:24+0800 INFO  Begin: wifibox restart
2023-08-12T21:49:24+0800 DEBUG restart=GV
2023-08-12T21:49:24+0800 INFO  Tearing down Unix Domain Socket pass-through
2023-08-12T21:49:24+0800 INFO  Daemonized socat processes found: [566
575]
2023-08-12T21:49:24+0800 INFO  Stopping guest wifibox, managed by PID 388
2023-08-12T21:49:24+0800 INFO  Waiting 3 seconds for the guest to stop
2023-08-13T05:50:52+0800 DEBUG Program started as /usr/local/sbin/wifibox, with arguments: start
2023-08-13T05:50:52+0800 INFO  Begin: wifibox start
2023-08-13T05:50:52+0800 DEBUG start=GN
2023-08-13T05:50:52+0800 INFO  Creating bridge interface: wifibox0
2023-08-13T05:50:52+0800 DEBUG [ifconfig] wifibox0
2023-08-13T05:50:52+0800 INFO  Linking tap interface to wifibox0: tap0
2023-08-13T05:50:52+0800 DEBUG assert loaded: kmod=vmm, kmod_file=/boot/kernel/vmm.ko
2023-08-13T05:50:52+0800 DEBUG Kernel module vmm is not loaded
2023-08-13T05:50:53+0800 DEBUG Kernel module vmm was loaded successfully
2023-08-13T05:50:53+0800 DEBUG assert hardware support present: iommu=1, amdvi=1
2023-08-13T05:50:53+0800 DEBUG Backends reported by bhyve:
2023-08-13T05:50:53+0800 DEBUG [bhyve] ahci
2023-08-13T05:50:53+0800 DEBUG [bhyve] ahci-hd
2023-08-13T05:50:53+0800 DEBUG [bhyve] ahci-cd
2023-08-13T05:50:53+0800 DEBUG [bhyve] e1000
2023-08-13T05:50:53+0800 DEBUG [bhyve] dummy
2023-08-13T05:50:53+0800 DEBUG [bhyve] fbuf
2023-08-13T05:50:53+0800 DEBUG [bhyve] hda
2023-08-13T05:50:53+0800 DEBUG [bhyve] amd_hostbridge
2023-08-13T05:50:53+0800 DEBUG [bhyve] hostbridge
2023-08-13T05:50:53+0800 DEBUG [bhyve] lpc
2023-08-13T05:50:53+0800 DEBUG [bhyve] nvme
2023-08-13T05:50:53+0800 DEBUG [bhyve] passthru
2023-08-13T05:50:53+0800 DEBUG [bhyve] uart
2023-08-13T05:50:53+0800 DEBUG [bhyve] virtio-9p
2023-08-13T05:50:53+0800 DEBUG [bhyve] virtio-blk
2023-08-13T05:50:53+0800 DEBUG [bhyve] virtio-console
2023-08-13T05:50:53+0800 DEBUG [bhyve] virtio-input
2023-08-13T05:50:53+0800 DEBUG [bhyve] virtio-net
2023-08-13T05:50:53+0800 DEBUG [bhyve] virtio-rnd
2023-08-13T05:50:53+0800 DEBUG [bhyve] virtio-scsi
2023-08-13T05:50:53+0800 DEBUG [bhyve] xhci
2023-08-13T05:50:53+0800 INFO  Bringing up null-modem devices for console connection
2023-08-13T05:50:53+0800 DEBUG assert loaded: kmod=nmdm, kmod_file=nmdm
2023-08-13T05:50:53+0800 DEBUG Kernel module nmdm is not loaded
2023-08-13T05:50:53+0800 DEBUG Kernel module nmdm was loaded successfully
2023-08-13T05:50:53+0800 INFO  Null-modem devices are available
2023-08-13T05:50:53+0800 INFO  Waiting for bhyve to start up
2023-08-13T05:50:53+0800 DEBUG Program started as /usr/local/sbin/wifibox, with arguments: _manage_vm
2023-08-13T05:50:53+0800 DEBUG assert daemonized: parent=daemon
2023-08-13T05:50:53+0800 INFO  VM manager launched
2023-08-13T05:50:53+0800 INFO  Gathering necessary configuration files for launching the guest
2023-08-13T05:50:53+0800 INFO  Pulling bhyve options from configuration file
2023-08-13T05:50:53+0800 DEBUG cpus=1, memory=128M, passthru=1/0/0, console=yes
2023-08-13T05:50:53+0800 INFO  Guest console is configured to use
2023-08-13T05:50:53+0800 INFO  Passthru device is configured: 1/0/0
2023-08-13T05:50:53+0800 INFO  PPT driver is configured for pci1:0:0 device
2023-08-13T05:50:53+0800 INFO  tap interface is configured: tap0
2023-08-13T05:50:53+0800 INFO  Launching guest wifibox from /usr/local/share/wifibox with grub-bhyve
2023-08-13T05:50:53+0800 INFO  Application config is found at /usr/local/etc/wifibox/wpa_supplicant
2023-08-13T05:50:53+0800 INFO  Application config will be mounted writeable
2023-08-13T05:50:53+0800 DEBUG Devices: virtio-blk,/usr/local/share/wifibox/disk.img virtio-9p,config=/usr/local/etc/wifibox/appliance,ro virtio-9p,var=/var/run/wifibox/appliance virtio-9p,app_config=/usr/local/etc/wifibox/wpa_supplicant
2023-08-13T05:50:53+0800 DEBUG Arguments:  -S -M 128M -r host -c /dev/nmdm-wifibox.1A -d /usr/local/share/wifibox wifibox
2023-08-13T05:50:53+0800 INFO  Launching guest wifibox from /usr/local/share/wifibox with bhyve
2023-08-13T05:50:53+0800 DEBUG Arguments:  -c 1 -m 128M -AHP -u -S -l com1,/dev/nmdm-wifibox.1A -s 0,hostbridge -s 31,lpc -s 4:0,virtio-blk,/usr/local/share/wifibox/disk.img -s 4:1,virtio-9p,config=/usr/local/etc/wifibox/appliance,ro -s 4:2,virtio-9p,var=/var/run/wifibox/appliance -s 4:3,virtio-9p,app_config=/usr/local/etc/wifibox/wpa_supplicant -s 5:0,e1000,tap0 -s 6:0,passthru,1/0/0 wifibox
2023-08-13T05:50:53+0800 DEBUG [bhyve] wrmsr to register 0xc0011029(0x3) on vcpu 0
2023-08-13T05:50:53+0800 DEBUG [bhyve] rdmsr to register 0xc00000e9 on vcpu 0
2023-08-13T05:50:54+0800 INFO  Guest wifibox has started up
2023-08-13T05:50:54+0800 INFO  Guest is managed by PID 386
2023-08-13T05:50:54+0800 INFO  Bringing up Unix Domain Socket pass-through
2023-08-13T05:50:54+0800 INFO  Found guest IP address: 10.0.0.1
2023-08-13T05:50:54+0800 INFO  Configured sockets: [path=/var/run/wpa_supplicant/wlan0,user=root,group=0,mode=770,port=1200
path=/var/run/wpa_supplicant/p2p-dev-wlan0,user=root,group=0,mode=770,port=1201]
2023-08-13T05:50:54+0800 INFO  Hooking up 10.0.0.1:1200 as /var/run/wpa_supplicant/wlan0 (root:0@770)
2023-08-13T05:50:54+0800 INFO  Hooking up 10.0.0.1:1201 as /var/run/wpa_supplicant/p2p-dev-wlan0 (root:0@770)
2023-08-13T05:50:54+0800 INFO  End: wifibox start
pgj commented 1 year ago

I just upgraded the system to 14.0-CURRENT without bhyve+ installed, according to #11

I do not think that #11 is relevant in this case. As you can see in the last comment, that issue was resolved by switching to the bhyve instance provided by the FreeBSD base system. All the problems that the user reported were originated from the way how to custom-built bhyve version (i.e. bhyve+, which included the vmm kernel module) interacted with the rest of the base system.

When I run kldunload vmm after wifibox stop guest, the system freezes suddenly. It seems to relate to the log, which indicates ppt0 isn't detached properly. I also tried devctl detach pci1:0:0, but it also returned Device not configured.

Wifibox attempts to clear the ppt driver on stop but there is no guarantee for the success. I am still inclined to think that it is a problem with the FreeBSD base system.

MegaManSec commented 2 months ago

I'm seeing similar behavior on my system. I would believe that indeed, it is either a problem with the FreeBSD base system or with the wifi device itself (BCM43602 in my case).

I'm also seeing that "The PPT device could not be destroyed" warning. But at least I can explain that. Take for example the attaching,detatching,clearing, of the driver:

user@evilco:~$ sudo devctl set driver -f pci0:3:0:0 ppt
ppt0 mem 0xa0800000-0xa0807fff,0xa0400000-0xa07fffff at device 0.0 on pci3
user@evilco:~$ sudo pciconf -a pci0:3:0:0
pci0:3:0:0: attached
user@evilco:~$ sudo devctl clear driver pci0:3:0:0
devctl: Failed to clear pci0:3:0:0 driver: Device busy
user@evilco:~$ sudo devctl detach -f pci0:3:0:0
ppt0: detached
user@evilco:~$ sudo pciconf -a pci0:3:0:0
pci0:3:0:0: not attached
user@evilco:~$ pciconf -lv | grep ppt
ppt0@pci0:3:0:0:        class=0x028000 rev=0x01 hdr=0x00 vendor=0x14e4 device=0x43ba subvendor=0x106b subdevice=0x0152
user@evilco:~$ sudo devctl clear driver pci0:3:0:0
pci3: <network> at device 0.0 (no driver attached)
devctl: Failed to clear pci0:3:0:0 driver: Device not configured
user@evilco:~$ echo $?
1                  
user@evilco:~$ pciconf -lv | grep ppt
none0@pci0:3:0:0:        class=0x028000 rev=0x01 hdr=0x00 vendor=0x14e4 device=0x43ba subvendor=0x106b subdevice=0x0152
user@evilco:~/dev/freebsd-wifibox-port$ sudo devctl clear driver pci0:3:0:0
user@evilco:~/dev/freebsd-wifibox-port$ echo $?
0

The message devctl: Failed to clear pci0:3:0:0 driver: Device not configured is misleading: the device is cleared. wifibox could attempt to clear the driver twice here, but I'm not sure if it's really wifibox's fault that some devices are broken.

MegaManSec commented 1 month ago

@mingww64 and @pgj: The issue is either in how bhyve destroys the VM, or in how FreeBSD's pci kernel code handles destruction. Possibly also the firmware itself (highly likely).

Anyways, You can successfully suspend if you connect to the VM work out the in-vm pci number (lspci)) and run: echo 1 | tee /sys/bus/pci/devices/0000\:00\:06.0/0000\:03\:00.0/remove Where 06 is the pci bus number. Then you can stop the wifibox service and unload vmm successfully.

Three weeks of debugging FreeBSD's kernel and all it took was a one-liner😬

I'll open a proper bug report soon, and I would imagine a proper bug report to FreeBSD-src soon. I would imagine a solution for wifibox would be to include an option to automatically "remove" the device from the VM when wifibox is stopped. That way, the sequence can be suspend->remove from vm->stop wifibox->unload vmm->resume->start wifibox.

MegaManSec commented 1 month ago

I've written-up the issue here: https://joshua.hu/brcmfmac-bcm43602-suspension-shutdown-hanging-freeze-linux-freebsd-wifi-bug-pci-passthru

The tl;dr is that the upon shutdown of the VM, the pcie device is not properly reset, leaving it in an inconsistent state. I will submit a patch to Linux with the very big patch:

--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c
@@ -2711,6 +2711,7 @@ static struct pci_driver brcmf_pciedrvr = {
        .id_table = brcmf_pcie_devid_table,
        .probe = brcmf_pcie_probe,
        .remove = brcmf_pcie_remove,
+       .shutdown = brcmf_pcie_remove,
 #ifdef CONFIG_PM
        .driver.pm = &brcmf_pciedrvr_pm,
 #endif
pgj commented 1 month ago

Nice work @MegaManSec, thank you for tracking down this issue!

Since we can control all the packages for Wifibox/Alpine, the guest image that is shipped for Wifibox by default, I could include your patch to the Linux kernel until it is merged upstream. Do I understand it right that if this is done, we would not need to do anything else on the Wifibox (script) side?

MegaManSec commented 1 month ago

Yes, the patch should be sufficient in of itself.

The only issue I can see is if the "kill -SIGTERM" to the VM does not complete before the VM is forcefully stopped. If we want extra protection, when the VM is stopped, we could somehow signal to the VM to directly "remove" the pci device from inside the Linux VM (rather than it be part of the shutdown sequence).

pgj commented 1 month ago

Version 20240823 of net/wifibox-alpine contains both fixes for the problem: Joshua's kernel patch and a remove-wireless OpenRC script that is run on shutdown and it implements the logic described above. I tried to make that script fit that it is going to be run relatively early on the SIGTERM-induced ACPI power-off event. At the moment, I do not have a better (cleaner) idea to notify the VM about the shutdown.

You can test this version by checking out the chore/net/wifibox-alpine/20240823 branch:

https://github.com/pgj/freebsd-wifibox-port/tree/chore/net/wifibox-alpine/20240823

MegaManSec commented 1 month ago

It works with some changes. From bhyve manpage: SIGTERM Trigger ACPI poweroff for a VM

However, vm_stop() kills the daemon PID with sigterm. If we look at pstree, we can see what really happens:

 |-+= 77437 root daemon: wifibox-vm-manager[77584] (daemon)
 | \-+- 77584 root /bin/sh /usr/local/sbin/wifibox _manage_vm
 |   |--- 98958 root bhyve: wifibox (bhyve)
 |   \--- 99018 root /bin/sh /usr/local/sbin/wifibox _manage_vm

vm_stop kills daemon, which forwards the signal to 77584 root /bin/sh /usr/local/sbin/wifibox _manage_vm.

But the shell script does not forward the signal to bhyve.

Fixed in https://github.com/pgj/freebsd-wifibox/pull/110

MegaManSec commented 1 month ago

And the only case I can tell where the fix does not work, is if the VM is killed while it is starting up -- it seems that Linux will not perform an ACPI poweroff while it is starting up. I assume there's nothing we can do about that, though.

pgj commented 1 month ago

Yes, that is an acknowledged risk.