I started to experiment with wifibox yesterday and it works pretty well upon a fresh boot. Once I start suspend/resuming (around 2-3 times), the internal "wlan0" interface in the VM no longer comes back up. I've done a few experiments to see what's going on. I don't think this is a bug with wifibox, but rather either bhyve's PCI passthrough code (and FLR). Restarting 'vmm' has no effect once my machine enters this state. Only a full reboot fixes it. I also did make sure that the devd suspend/resume workaround was being triggered by adding a "logger" statement and checking /var/log/messages afterwards. Reporting here for more exposure in this project. We could escalate it up to FreeBSD as needed.
Running On (but probably will also happen on 14-CURRENT):
jon@leslie:~ $ uname -a
FreeBSD leslie 13.1-STABLE FreeBSD 13.1-STABLE #0 stable/13-n251926-488f9d85278: Mon Jul 25 21:01:52 EDT 2022 root@leslie:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64
root@leslie:~ # pkg info | grep wifibox
wifibox-1.1.1 Wireless card driver via virtualized Linux
wifibox-alpine-20220712 Wifibox guest based on Alpine Linux
wifibox-core-0.10.0 Wifibox core functionality
This is the card information if I were to use the 'if_iwlwifi' module:
Jul 26 10:28:16 leslie kernel: Intel(R) Wireless WiFi based driver for FreeBSD
Jul 26 10:28:16 leslie kernel: iwlwifi0: <iwlwifi> mem 0xea238000-0xea23bfff at device 20.3 on pci0
Jul 26 10:28:16 leslie kernel: iwlwifi0: could not load firmware image 'iwlwifi-QuZ-a0-jf-b0-72.ucode'
Jul 26 10:28:16 leslie kernel: iwlwifi0: File size way too small!
Jul 26 10:28:16 leslie kernel: iwlwifi0: successfully loaded firmware image 'iwlwifi-QuZ-a0-jf-b0-71.ucode'
Jul 26 10:28:16 leslie kernel: iwlwifi0: api flags index 2 larger than supported by driver
Jul 26 10:28:16 leslie kernel: iwlwifi0: TLV_FW_FSEQ_VERSION: FSEQ Version: 89.3.35.37
Jul 26 10:28:16 leslie kernel: iwlwifi0: loaded firmware version 71.058653f6.0 QuZ-a0-jf-b0-71.ucode op_mode iwlmvm
Jul 26 10:28:16 leslie kernel: iwlwifi0: Detected Intel(R) Wireless-AC 9560 160MHz, REV=0x351
Jul 26 10:28:42 leslie kernel: iwlwifi0: Detected RF JF, rfid=0x105110
Jul 26 10:28:42 leslie kernel: iwlwifi0: base HW address: f8:e4:e3:eb:35:02
I've noticed that there is a higher chance for this situation to happen once I start getting FLR transaction messages:
Jul 26 11:16:08 leslie kernel: bridge0: Ethernet address: 58:9c:fc:10:ff:99
Jul 26 11:16:08 leslie kernel: bridge0: changing name to 'wifibox0'
Jul 26 11:16:08 leslie kernel: tap0: Ethernet address: 58:9c:fc:10:c0:3f
Jul 26 11:16:08 leslie kernel: tap0: promiscuous mode enabled
Jul 26 11:16:08 leslie kernel: wifibox0: link state changed to DOWN
Jul 26 11:16:08 leslie kernel: ppt0 mem 0xea238000-0xea23bfff at device 20.3 on pci0
Jul 26 11:16:08 leslie kernel: tap0: link state changed to UP
Jul 26 11:16:08 leslie kernel: wifibox0: link state changed to UP
Jul 26 11:16:08 leslie kernel: pci0:0:20:3: Resetting with transactions pending after 50 ms
Jul 26 11:16:08 leslie kernel: pci0:0:20:3: Transactions pending after FLR!
Host results (At this point this layer is correct but doesn't matter because the wlan0 inside the VM is gone.)
root@leslie:~ # ifconfig
em0: flags=8862<BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=481049b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LRO,VLAN_HWFILTER,NOMAP>
ether f8:75:a4:ef:9d:81
media: Ethernet autoselect
status: no carrier
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2
inet 127.0.0.1 netmask 0xff000000
groups: lo
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
wifibox0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
ether 58:9c:fc:10:ff:99
inet 10.1.0.2 netmask 0xffffff00 broadcast 10.1.0.255
id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
member: tap0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
ifmaxaddr 0 port 4 priority 128 path cost 2000000
groups: bridge
nd6 options=9<PERFORMNUD,IFDISABLED>
tap0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=80000<LINKSTATE>
ether 58:9c:fc:10:c0:3f
groups: tap
media: Ethernet autoselect
status: active
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
Opened by PID 14824
Attempting to bring up the wlan0 interface in the VM.
wifibox:~# service networking restart
* WARNING: you are stopping a boot service
* WARNING: you are stopping a boot service
* Stopping Unix Domain Socket pass-through ...
[ ok ]
* Stopping busybox udhcpd ...
[ ok ]
* Stopping networking ...
* lo ...
[ ok ]
* eth0 ...
[ ok ]
* Starting networking ...
* lo ...
[ ok ]
* eth0 ...
[ ok ]
* wlan0 ...
ip: SIOCGIFFLAGS: No such device
udhcpc: SIOCGIFINDEX: No such device
ifup: failed to change interface wlan0 state to 'up'
[ !! ]
wifibox:~# * Starting Unix Domain Socket pass-through ...
* Starting busybox udhcpd ...
[ ok ]
[ ok ]
Check if ppt0 is attached to the pci device after we started wifibox (As we can see, it is and that's correct).
Shutting down the VM should clear the ppt driver from the wifi card. "The PPT device could not be destroyed" message always pops up even if the resume didn't break the connection. It usually takes 2-3 times before it breaks.
root@leslie:~ # service wifibox stop
Stopping wifibox....WARNING: /var/log/wifibox.log is not writeable, messages could not be logged.
WARNING: The PPT device could not be destroyed.
...OK
root@leslie:~ # pciconf -lv
none3@pci0:0:20:3: class=0x028000 rev=0x00 hdr=0x00 vendor=0x8086 device=0x02f0 subvendor=0x8086 subdevice=0x0030
vendor = 'Intel Corporation'
device = 'Comet Lake PCH-LP CNVi WiFi'
class = network
There was also an instance where I was able to load the 'if_iwlwifi' module after the wlan0 broke in the VM, and was able to use wireless on the host directly. However, this isn't guaranteed and eventually I'll start getting "ifconfig: SIOCIFCREATE2: Device not configured" messages, preventing the wlan0 interface from even starting up.
Attached is part of my /var/log/messages file that may help:
Hello,
I started to experiment with wifibox yesterday and it works pretty well upon a fresh boot. Once I start suspend/resuming (around 2-3 times), the internal "wlan0" interface in the VM no longer comes back up. I've done a few experiments to see what's going on. I don't think this is a bug with wifibox, but rather either bhyve's PCI passthrough code (and FLR). Restarting 'vmm' has no effect once my machine enters this state. Only a full reboot fixes it. I also did make sure that the devd suspend/resume workaround was being triggered by adding a "logger" statement and checking /var/log/messages afterwards. Reporting here for more exposure in this project. We could escalate it up to FreeBSD as needed.
Running On (but probably will also happen on 14-CURRENT):
This is the card information if I were to use the 'if_iwlwifi' module:
I've noticed that there is a higher chance for this situation to happen once I start getting FLR transaction messages:
Host results (At this point this layer is correct but doesn't matter because the wlan0 inside the VM is gone.)
VM networking results upon a bad resume
Attempting to bring up the wlan0 interface in the VM.
Check if ppt0 is attached to the pci device after we started wifibox (As we can see, it is and that's correct).
Shutting down the VM should clear the ppt driver from the wifi card. "The PPT device could not be destroyed" message always pops up even if the resume didn't break the connection. It usually takes 2-3 times before it breaks.
There was also an instance where I was able to load the 'if_iwlwifi' module after the wlan0 broke in the VM, and was able to use wireless on the host directly. However, this isn't guaranteed and eventually I'll start getting "ifconfig: SIOCIFCREATE2: Device not configured" messages, preventing the wlan0 interface from even starting up.
Attached is part of my /var/log/messages file that may help:
report.log