pgj / freebsd-wifibox

wifibox: Use Linux to drive your wireless card on FreeBSD
BSD 2-Clause "Simplified" License
168 stars 12 forks source link

wlan0 interface inside VM no longer comes up after a few times doing suspend/resume #31

Closed fearedbliss closed 1 year ago

fearedbliss commented 2 years ago

Hello,

I started to experiment with wifibox yesterday and it works pretty well upon a fresh boot. Once I start suspend/resuming (around 2-3 times), the internal "wlan0" interface in the VM no longer comes back up. I've done a few experiments to see what's going on. I don't think this is a bug with wifibox, but rather either bhyve's PCI passthrough code (and FLR). Restarting 'vmm' has no effect once my machine enters this state. Only a full reboot fixes it. I also did make sure that the devd suspend/resume workaround was being triggered by adding a "logger" statement and checking /var/log/messages afterwards. Reporting here for more exposure in this project. We could escalate it up to FreeBSD as needed.

Running On (but probably will also happen on 14-CURRENT):

jon@leslie:~ $ uname -a
FreeBSD leslie 13.1-STABLE FreeBSD 13.1-STABLE #0 stable/13-n251926-488f9d85278: Mon Jul 25 21:01:52 EDT 2022     root@leslie:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64

root@leslie:~ # pkg info | grep wifibox
wifibox-1.1.1                  Wireless card driver via virtualized Linux
wifibox-alpine-20220712        Wifibox guest based on Alpine Linux
wifibox-core-0.10.0            Wifibox core functionality

This is the card information if I were to use the 'if_iwlwifi' module:

Jul 26 10:28:16 leslie kernel: Intel(R) Wireless WiFi based driver for FreeBSD
Jul 26 10:28:16 leslie kernel: iwlwifi0: <iwlwifi> mem 0xea238000-0xea23bfff at device 20.3 on pci0
Jul 26 10:28:16 leslie kernel: iwlwifi0: could not load firmware image 'iwlwifi-QuZ-a0-jf-b0-72.ucode'
Jul 26 10:28:16 leslie kernel: iwlwifi0: File size way too small!
Jul 26 10:28:16 leslie kernel: iwlwifi0: successfully loaded firmware image 'iwlwifi-QuZ-a0-jf-b0-71.ucode'
Jul 26 10:28:16 leslie kernel: iwlwifi0: api flags index 2 larger than supported by driver
Jul 26 10:28:16 leslie kernel: iwlwifi0: TLV_FW_FSEQ_VERSION: FSEQ Version: 89.3.35.37
Jul 26 10:28:16 leslie kernel: iwlwifi0: loaded firmware version 71.058653f6.0 QuZ-a0-jf-b0-71.ucode op_mode iwlmvm
Jul 26 10:28:16 leslie kernel: iwlwifi0: Detected Intel(R) Wireless-AC 9560 160MHz, REV=0x351
Jul 26 10:28:42 leslie kernel: iwlwifi0: Detected RF JF, rfid=0x105110
Jul 26 10:28:42 leslie kernel: iwlwifi0: base HW address: f8:e4:e3:eb:35:02

I've noticed that there is a higher chance for this situation to happen once I start getting FLR transaction messages:

Jul 26 11:16:08 leslie kernel: bridge0: Ethernet address: 58:9c:fc:10:ff:99
Jul 26 11:16:08 leslie kernel: bridge0: changing name to 'wifibox0'
Jul 26 11:16:08 leslie kernel: tap0: Ethernet address: 58:9c:fc:10:c0:3f
Jul 26 11:16:08 leslie kernel: tap0: promiscuous mode enabled
Jul 26 11:16:08 leslie kernel: wifibox0: link state changed to DOWN
Jul 26 11:16:08 leslie kernel: ppt0 mem 0xea238000-0xea23bfff at device 20.3 on pci0
Jul 26 11:16:08 leslie kernel: tap0: link state changed to UP
Jul 26 11:16:08 leslie kernel: wifibox0: link state changed to UP
Jul 26 11:16:08 leslie kernel: pci0:0:20:3: Resetting with transactions pending after 50 ms
Jul 26 11:16:08 leslie kernel: pci0:0:20:3: Transactions pending after FLR!

Host results (At this point this layer is correct but doesn't matter because the wlan0 inside the VM is gone.)

root@leslie:~ # ifconfig
em0: flags=8862<BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=481049b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LRO,VLAN_HWFILTER,NOMAP>
    ether f8:75:a4:ef:9d:81
    media: Ethernet autoselect
    status: no carrier
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
    options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
    inet6 ::1 prefixlen 128
    inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2
    inet 127.0.0.1 netmask 0xff000000
    groups: lo
    nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
wifibox0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    ether 58:9c:fc:10:ff:99
    inet 10.1.0.2 netmask 0xffffff00 broadcast 10.1.0.255
    id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
    maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
    root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
    member: tap0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
            ifmaxaddr 0 port 4 priority 128 path cost 2000000
    groups: bridge
    nd6 options=9<PERFORMNUD,IFDISABLED>
tap0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=80000<LINKSTATE>
    ether 58:9c:fc:10:c0:3f
    groups: tap
    media: Ethernet autoselect
    status: active
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
    Opened by PID 14824

VM networking results upon a bad resume

wifibox:~# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:A0:98:8A:05:71  
          inet addr:10.1.0.1  Bcast:0.0.0.0  Mask:255.255.255.0
          inet6 addr: fe80::2a0:98ff:fe8a:571/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:30 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:2428 (2.3 KiB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

Attempting to bring up the wlan0 interface in the VM.

wifibox:~# service networking restart
 * WARNING: you are stopping a boot service
 * WARNING: you are stopping a boot service
 * Stopping Unix Domain Socket pass-through ...
 [ ok ]
 * Stopping busybox udhcpd ...
 [ ok ]
 * Stopping networking ...
 *   lo ...
 [ ok ]
 *   eth0 ...
 [ ok ]
 * Starting networking ...
 *   lo ...
 [ ok ]
 *   eth0 ...
 [ ok ]
 *   wlan0 ...
ip: SIOCGIFFLAGS: No such device
udhcpc: SIOCGIFINDEX: No such device
ifup: failed to change interface wlan0 state to 'up'
 [ !! ]
wifibox:~#  * Starting Unix Domain Socket pass-through ...
 * Starting busybox udhcpd ...
 [ ok ]
 [ ok ]

Check if ppt0 is attached to the pci device after we started wifibox (As we can see, it is and that's correct).

root@leslie:~ # pciconf -lv
ppt0@pci0:0:20:3:       class=0x028000 rev=0x00 hdr=0x00 vendor=0x8086 device=0x02f0 subvendor=0x8086 subdevice=0x0030
    vendor     = 'Intel Corporation'
    device     = 'Comet Lake PCH-LP CNVi WiFi'
    class      = network

Shutting down the VM should clear the ppt driver from the wifi card. "The PPT device could not be destroyed" message always pops up even if the resume didn't break the connection. It usually takes 2-3 times before it breaks.

root@leslie:~ # service wifibox stop
Stopping wifibox....WARNING: /var/log/wifibox.log is not writeable, messages could not be logged.
WARNING: The PPT device could not be destroyed.
...OK

root@leslie:~ # pciconf -lv
none3@pci0:0:20:3:      class=0x028000 rev=0x00 hdr=0x00 vendor=0x8086 device=0x02f0 subvendor=0x8086 subdevice=0x0030
    vendor     = 'Intel Corporation'
    device     = 'Comet Lake PCH-LP CNVi WiFi'
    class      = network

There was also an instance where I was able to load the 'if_iwlwifi' module after the wlan0 broke in the VM, and was able to use wireless on the host directly. However, this isn't guaranteed and eventually I'll start getting "ifconfig: SIOCIFCREATE2: Device not configured" messages, preventing the wlan0 interface from even starting up.

Attached is part of my /var/log/messages file that may help:

report.log

pgj commented 2 years ago

Hi @fearedbliss thanks for the detailed report. When running Wifibox, is the if_iwlwifi module loaded in parallel? If yes that might cause some interference between the PCI pass-through and the driver's code.

It would be curious to see the dmesg from the guest on a bad resume. I would like to see the error coming from the iwlwifi kernel module, hopefully there is something like at least. When the wlan0 interface is missing that means that the iwlwifi module could not be loaded for some reason.

Also, it would be useful to learn if a vanilla 13.1-RELEASE does the same. The 13-STABLE branch may contain experimental changes that may de-stabilize bhyve and we might be observing a regression like that. Unfortunately, I have not seen such problems before and I have a different type of card, so I feel it hard to reproduce this on my side. In my experience, it is possible to boot a system from a FreeBSD memory stick image, set up a minimal system on a tmpfs and move there with a chroot, configure the suspend/resume and then verify if the problem comes up. I could also take a chance with a 13-STABLE image on my machine.

fearedbliss commented 2 years ago

Hey @pgj,

Thanks for the reply :). I'm not home at the moment so can't provide the dmesg output (but will when available), however I wanted to say that both if_iwlwifi and if_iwm are both on my devmatch_blacklist in /etc/rc.conf, so they are both blocked in order to eliminate any conflicts or attachments of those drivers to the wifi pci slot. Only the wifibox can use it ;).

It's possible that there is a regression in STABLE for sure. This is on my Thinkpad X1 Carbon Gen 7 if that makes any difference.

Also to clarify the above, they are normally in my Blocklist but for the purposes of my experiments I manually loaded the driver during one of my trials 🗡️.

fearedbliss commented 2 years ago

Attached is the dmesg output from the Alpine VM.

dmesg-alpine.txt

pgj commented 2 years ago

Thanks for the VM's dmesg output. Looks like the device is completely missing so the VM does not even know about it, that is, the PCI pass-through is broken. Could you please help me to verify this by sharing the output of the lspci command too?

fearedbliss commented 2 years ago

@pgj Yup that seems correct, I see that the 00:06.0 Class 0280: 8086:02f0 is completely missing after I did a zzz (just one time and then I waited like a minute. I've noticed if I resume to quickly it doesn't break immediately, but this could have just been a non-deterministic scenario):

Before

wifibox:~# lspci
00:1f.0 Class 0601: 8086:7000
00:04.2 Class 0100: 1af4:1009
00:04.0 Class 0100: 1af4:1001
00:00.0 Class 0600: 1275:1275
00:04.3 Class 0100: 1af4:1009
00:06.0 Class 0280: 8086:02f0
00:04.1 Class 0100: 1af4:1009
00:05.0 Class 0200: 8086:100f

wifibox:~# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:A0:98:8A:05:71  
          inet addr:10.1.0.1  Bcast:0.0.0.0  Mask:255.0.0.0
          inet6 addr: fe80::2a0:98ff:fe8a:571/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:20396 errors:0 dropped:0 overruns:0 frame:0
          TX packets:33557 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:4197622 (4.0 MiB)  TX bytes:27342203 (26.0 MiB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

wlan0     Link encap:Ethernet  HWaddr F8:E4:E3:EB:35:02  
          inet addr:192.168.1.139  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::fae4:e3ff:feeb:3502/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:34620 errors:0 dropped:0 overruns:0 frame:0
          TX packets:20133 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:27437272 (26.1 MiB)  TX bytes:4448962 (4.2 MiB)

wifibox:~# dmesg | grep iwlwifi
[    0.869316] iwlwifi 0000:00:06.0: can't derive routing for PCI INT A
[    0.869318] iwlwifi 0000:00:06.0: PCI INT A: not connected
[    0.870101] iwlwifi 0000:00:06.0: Failed to set affinity mask for IRQ 41
[    0.926465] iwlwifi 0000:00:06.0: api flags index 2 larger than supported by driver
[    0.926473] iwlwifi 0000:00:06.0: TLV_FW_FSEQ_VERSION: FSEQ Version: 89.3.35.37
[    0.926601] iwlwifi 0000:00:06.0: loaded firmware version 66.f1c864e0.0 QuZ-a0-jf-b0-66.ucode op_mode iwlmvm
[    0.946738] iwlwifi 0000:00:06.0: Detected Intel(R) Wireless-AC 9560 160MHz, REV=0x354
[    1.061975] iwlwifi 0000:00:06.0: Detected RF JF, rfid=0x105110
[    1.118696] iwlwifi 0000:00:06.0: base HW address: f8:e4:e3:eb:35:02
[    5.207325] iwlwifi 0000:00:06.0: Unhandled alg: 0x3f0707

After

wifibox:~# lspci
00:1f.0 Class 0601: 8086:7000
00:04.2 Class 0100: 1af4:1009
00:04.0 Class 0100: 1af4:1001
00:00.0 Class 0600: 1275:1275
00:04.3 Class 0100: 1af4:1009
00:04.1 Class 0100: 1af4:1009
00:05.0 Class 0200: 8086:100f

wifibox:~# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:A0:98:8A:05:71  
          inet addr:10.1.0.1  Bcast:0.0.0.0  Mask:255.0.0.0
          inet6 addr: fe80::2a0:98ff:fe8a:571/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:145 errors:0 dropped:0 overruns:0 frame:0
          TX packets:17 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:143406 (140.0 KiB)  TX bytes:1594 (1.5 KiB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

wifibox:~# dmesg | grep iwlwifi
<nothing>
pgj commented 2 years ago

All right, that confirms that this is a bhyve PCI pass-through problem. The next step would be tell if this is a regression or it always acted like that on your configuration. I will try to get a recent 13.1-STABLE system and reproduce the problem there. Based on your description, it should not be that hard.

fearedbliss commented 2 years ago

@pgj Sounds good. I guess I should hope it fails for you too on STABLE and it isn't just me 😂.

And yea, you could checkout 488f9d85278 (n251926) on the stable/13 branch.

fearedbliss commented 2 years ago

@pgj I compiled 13.1-RELEASE from source (releng/13.1) and re-tested my steps. I was still able to reproduce it on:

FreeBSD 13.1-RELEASE releng/13.1-n250148-fc952ac2212 GENERIC amd64

For a second I thought that there may have been some weird behavior happening between my Anker Dongle (HDMI/Ethernet/USB Ports/SD/Micro SD/Power all over Type C) and my Thinkpad (maybe something about going to sleep and resuming with all of that attached). However, after about 5-6 consecutive working sleep/resume, it finally dropped the 00:06.0 Class 0280: 8086:02f0 slot from the VM. So it seems the same issue has been there and I don't think there is a regression, just a bug somewhere.

I've attached a longer /var/log/messages below (contains -STABLE and then -RELEASE logs). Things related to ppt0 mem 0xea238000-0xea23bfff at device 20.3 on pci0 seem interesting.

messages-1.txt

fearedbliss commented 2 years ago

@pgj I've noticed (still need more testing though) a possible workaround to this issue. It seems that if I stop wifibox before I put the computer to sleep, it will allow the machine to resume and still keep the PCI interface in the VM. This makes me believe that there is some resources not cleanly being handled when a zzz operation is triggered with an actively running bhyve VM using a PCI passthrough device.

I've also updated to FreeBSD 13.1-STABLE #0 stable/13-n251972-fb8e858c69f, no major changes but just wanted to keep you up to date.

This is the latest dmesg output (with me suspend/resume but manually stopping the wifibox before I run zzz):

messages-2.txt

These may be of interest (in a correct flow):

bridge0: Ethernet address: 58:9c:fc:10:ff:99
bridge0: changing name to 'wifibox0'
tap0: Ethernet address: 58:9c:fc:10:c0:3f
tap0: promiscuous mode enabled
wifibox0: link state changed to DOWN
ppt0 mem 0xea238000-0xea23bfff at device 20.3 on pci0
tap0: link state changed to UP
wifibox0: link state changed to UP
lo0: link state changed to UP

...

tap0: link state changed to DOWN
wifibox0: link state changed to DOWN
ppt0: detached
pci0: <network> at device 20.3 (no driver attached)
ue0: link state changed to UP
tap0: promiscuous mode disabled

...

ppt0: detached
pci0: <network> at device 20.3 (no driver attached)
ppt0 mem 0xea238000-0xea23bfff at device 20.3 on pci0
bridge0: Ethernet address: 58:9c:fc:10:ff:99
bridge0: changing name to 'wifibox0'
tap0: Ethernet address: 58:9c:fc:10:c0:3f
tap0: promiscuous mode enabled
wifibox0: link state changed to DOWN
tap0: link state changed to UP
wifibox0: link state changed to UP
tap0: link state changed to DOWN
wifibox0: link state changed to DOWN
ppt0: detached
pci0: <network> at device 20.3 (no driver attached)
ppt0 mem 0xea238000-0xea23bfff at device 20.3 on pci0
tap0: link state changed to UP
wifibox0: link state changed to UP
tap0: link state changed to DOWN
wifibox0: link state changed to DOWN
ppt0: detached
pci0: <network> at device 20.3 (no driver attached)
tap0: promiscuous mode disabled

I'll be testing this for a few days to see what happens and report back. I'll probably add a little bit of automation (example, when I suspend my screen via shortcuts on i3, I'll do a wifibox stop && zzz command, and then on resume I'll have the devfs resume event do a wifibox start && dhclient wifibox0). I'll let you know what happens.

fearedbliss commented 2 years ago

I've automated the suspend/resume for stopping/starting/assigning ip of wifibox and it seems to be working (I'll need more time to test of course). I added the suspend section (not sure if the notify number is correct but it seems to be functioning correctly when doing multiple suspend/resumes):

root@leslie:~ # cat /usr/local/etc/devd/wifibox.conf

# This is a `devd(8)` configuration file to run the resume action of
# wifibox on the ACPI resume event.  Review the contents and create a
# copy of it without the `.sample` extension to use it.  Restart the
# `devd` service once the file has been created.

notify 11 {
        match "system"          "ACPI";
        match "subsystem"       "Suspend";
        action "logger 'Stopping wifibox before suspend' && /usr/local/sbin/wifibox stop && /etc/rc.suspend acpi $notify";
};

notify 11 {
        match "system"          "ACPI";
        match "subsystem"       "Resume";
        action "/etc/rc.resume acpi $notify && logger 'Starting wifibox after resume and getting IP via DHCP' && /usr/local/sbin/wifibox start && /sbin/dhclient wifibox0";
};

and this is how the /var/log/messages | grep wifibox looks like after a suspend/resume cycle:

Jul 28 14:14:08 leslie root[3910]: Stopping wifibox before suspend
Jul 28 14:14:11 leslie kernel: wifibox0: link state changed to DOWN
Jul 28 14:14:12 leslie dhclient[3177]: receive_packet failed on wifibox0: Device not configured
Jul 28 14:14:12 leslie dhclient[3177]: ioctl(SIOCGIFFLAGS) on wifibox0: Operation not permitted
Jul 28 14:14:12 leslie dhclient[3177]: Interface wifibox0 no longer appears valid.
Jul 28 14:14:38 leslie root[4098]: Starting wifibox after resume and getting IP via DHCP
Jul 28 14:14:38 leslie kernel: bridge0: changing name to 'wifibox0'
Jul 28 14:14:38 leslie kernel: wifibox0: link state changed to DOWN
Jul 28 14:14:38 leslie kernel: wifibox0: link state changed to UP
Jul 28 14:14:55 leslie dhclient[4412]: New IP Address (wifibox0): 10.1.0.2
Jul 28 14:14:55 leslie dhclient[4416]: New Subnet Mask (wifibox0): 255.255.255.0
Jul 28 14:14:55 leslie dhclient[4420]: New Broadcast Address (wifibox0): 10.1.0.255
Jul 28 14:14:55 leslie dhclient[4424]: New Routers (wifibox0): 10.1.0.1

Notice that the /etc/rc.suspend acpi $notify call happens at the end of our action event. I originally tried it at the beginning but when I resumed the PCI passthrough issue immediately popped up. I'm guessing is because since I called suspend before, the system didn't have enough time to finish stopping the wifibox VM, so we got the call, but not a clean shutdown of the VM, thus triggering my theory of the PCI devices not being successfully cleaned up when the VM is running. When I flipped it and put the /etc/rc.suspend call at the end, I noticed that my sleep took a little bit longer but that would mean that it was just cleanly shutting down the VM before sleeping. It also didn't matter if I ran zzz as root or as the user.

Lastly, there was one time I resumed where one of the machine lights lit up but it actually never resumed (The power button still was flashing as if it was sleeping, but the F1 (Mute Key Light) was on - which only happens when the machine is powered back on). I don't necessarily think this is related to the above changes since I've experienced these sort of things on different machines before. It may just be because I'm sleeping and resuming multiple times in relatively quick succession (10 seconds ~ maybe more maybe less)... but I wanted to mention this for the record just in case.

Let me know what you think :).

pgj commented 2 years ago

Thanks for all the useful bits of information and the investigation! It is curious you have to use the zzz command for the suspend: on my system, it is configured to be linked with the lid switch event. This can be set by changing the value of the hw.acpi.lid_switch_state sysctl(8) variable to S3. On closing the notebook (Lenovo ThinkPad X270) it goes to sleep without calling zzz and resumes when I opened. I am not sure this makes any difference, I just noticed it.

I think the proposed workaround can be okay, although deleting the linked networking interface (that wifibox stop does) adds some extra delay on resume because it has to be re-created from scratch and dhclient(8) has to be explicitly called again to get back the previously allocated IP address. That is why I implemented wifibox restart guest to allow for keeping the interface around, which of course cannot be used in this situation, but something similar can be added. Something like wifibox stop guest and wifibox start guest. And this has to be documented on the manual page and in the sample devd.conf configuration file.

In the long term, this bug has to be fixed in bhyve and shall be reported. Either as a regular FreeBSD bug or by starting a discussion on the freebsd-virtualization mailing list. I think it is better if you do this directly (I do not have any experience with bhyve internals though I am open to learn about it) but I can help with this. I would definitely follow the related conversations.

fearedbliss commented 2 years ago

Haha. By default FreeBSD doesn't do anything in regards to suspending on lid close, and yea that's the variable you can set. I don't use it though since for my work flow sometimes I want to be able to just close the lid without shutting the machine (maybe compiling something over night but I don't want the monitor light just shining all the time haha). The machine does resume automatically (IIRC) if I did manually zzz and then I open the lid.

Regarding the wifibox restart guest, yea my solution adds a little bit of a delay when you sleep and resume because exactly as you said, it needs to re-do the whole thing. But in this case that's what I want due to the bug.

And yea I agree we need to get this fixed at the base. I wanted to get your feedback on all of this before a bug report is opened, but if we have enough info here I can go ahead and open up a bug report (and I can also message them in that mailing list). I have no experience with bhyve, it's my first time using it but I've used different sorts of virtualization technologies before (qemu comes to mind), but in the name of lazyness, I sometimes just usually use VirtualBox for light testing of stuff.

pgj commented 2 years ago

In my opinion, the networking interface does not have to be deleted necessarily. All what you want is to spin down the guest to release the PCI device before going to sleep. The "fake" wifibox0 interface is a bridge where the VM's tap device is added on startup and can be re-added once the VM is up again on resume. It could then keep the assigned IP address and there is no need for any DHCP request.

I think we are set for submitting the bug report, there are very specific details on how to reproduce the issue. Perhaps a more minimal test case could be better without requiring Wifibox, but the current description can also work. Then the developers will just follow up and ask for more information or for testing patches.

fearedbliss commented 2 years ago

I'm writing the bug report as we speak ;D. I'll also experiment with just stopping and starting the guest, if we don't have to run dhclient that would be great. Let alone we save a little bit of time upon sleep/resume. I'll post another comment once I submit the report.

fearedbliss commented 2 years ago

I've opened up this ticket and emailed the freebsd-virtualization mailing list.

I'll be experimenting with your suggestions regarding the restarting only the guest and not needing to run dhclient.

fearedbliss commented 2 years ago

Haha I just noticed that you said you still would need to implement something like wifibox start/stop guest. If you do end up doing that I would be happy to test it. I'm not sure if you wanna close this issue then or if you want to keep it open as a ticket for the start/stop guest implementation.

Thanks for the help and suggestions :).

pgj commented 2 years ago

We can keep this ticket open for the start/stop guest feature. I definitely want this problem documented on the manual page, which would need another commit.

fearedbliss commented 2 years ago

Sounds good, thank you. I feel since the new commands are related to the documentation they could both be part of the same commit.

Also let me know if you need help explaining anything in the documentation.

pgj commented 2 years ago

By the way, another workaround could be to use static IP address and routing for the guest, see the manual page for the details. Then there is no need to make DHCP requests since all the configuration parameters are known in advance, although it is a less flexible method. Just as a hint.

fearedbliss commented 2 years ago

I don't mind doing that since I sometimes use static IPs as well. Something I already did was switch the IP space to 10.1.0.0/24 since my wireguard interface is on 10.0.0.1. However I can't ping my box since iptables or something is blocking it. I'm trying to figure that part out now.

fearedbliss commented 2 years ago

Small update on the wireguard routing situation. My initial configuration was correct (regarding slightly shifting the wifibox addresses from their default of 10.0.0.0/24 -> 10.1.0.0/24), however, I had forgotten that on my server, when I re-architected it to use jails, I locked down the host ListenAddress to 192.168.1.100, so that meant the previous 10.0.0.1 address that is the IP for that machine on my wireguard network, was no longer listened to. I added ListenAddress 10.0.0.1 in addition to the 192.168.1.100 and it's all working. Wifibox has no issues communicating with any of my 192.168.1.X services and doesn't have issues communicating with 10.0.0.X wireguard VPN services as well :D.

fearedbliss commented 2 years ago

I've switched from my Thinkpad X1C7 back to my Framework Laptop (Batch 6). I was able to reproduce the same issue on this machine, and my workaround also worked as well.

pgj commented 2 years ago

@fearedbliss I implemented the guest parameter for the start and stop command (together with some documentation, as we discussed above). Here is a version of the net/wifibox-core port that you can use to install the code and try it: https://github.com/pgj/freebsd-wifibox-port/tree/c0451e21bb9498dbc377532350ed42737b035316 .

fearedbliss commented 2 years ago

Hey @pgj, I no longer have FreeBSD installed on the laptop (only on server), but if the changes are tested, definitely merge it in.

pgj commented 2 years ago

I tested the change on my machine and it looks fine, I have merged it. With this, I consider this ticket closed.

fearedbliss commented 2 years ago

Thanks @pgj !

fearedbliss commented 2 years ago

Hey @pgj, I'm back and have FreeBSD (stable/13-n252783-ef2aa775301) installed on my laptop again so I can continue helping with testing. I know this is already merged in and I see your other commits as well. At the moment I have the normal wifibox-iwlwifi-1.1.1 installed:

wifibox version 0.10.0
Disk image checksum: 1a6e223fab27869faff129778612d3db792c23b8bd8c7692454c3ef06856930

but will see about cloning HEAD and setting that up. Any recommended steps for isolating HEAD vs the one installed on my system? or just do it however I want (I want to make sure I can have some clean repro steps for you if anything arises).

pgj commented 2 years ago

What do you mean by HEAD here? Is this the main branch of Wifibox?

If you clone the pgj/freebsd-wifibox-port repository you can easily move back and forth between the different versions by the tags and branches. What you should be careful about is to fetch all the distfiles and then you can rebuild the different versions without any working Internet connection. Use the make FLAVOR=iwlwifi (as normal user), make FLAVOR=iwlwifi deinstall (as root), make FLAVOR=iwlwifi reinstall clean combo (as root) in each case. That is what I usually do at least.

fearedbliss commented 2 years ago

@pgj That's correct, HEAD here refers to main. Primarily since there isn't an official released port containing the new changes, I'll want to clone it and then do my testing from that side.

pgj commented 2 years ago

Yeah, sorry about the not making a new release. I have less time to work on this project and I did not want to release something that is not complete. The Wifibox bits are mostly done but the guest still has some issues to be fixed before cutting a new version.

fearedbliss commented 2 years ago

@pgj That's understandable and not a problem. I posted some recent progress regarding some issues on the native iwlwifi driver regarding DHCPOFFERs not being given (which is probably a bug that has other effects). I shared that to the freebsd-wireless mailing list yesterday, so hopefully that will allow me to continue using the native driver and trying to provide more feedback for it ;). I'm happy to have wifibox as a trusted fallback for sure.

fearedbliss commented 1 year ago

Hey @pgj,

I just got around to testing the latest 0.11.0 release that included these changes. Since there is no wiifibox-core-0.11.0 update (still at 0.10.0), I manually installed it with the following steps:

  1. Check out the freebsd-wifibox repo.
  2. Install dependencies manually: wifibox-alpine-iwlwifi and grub2-bhyve.
  3. Go into freebsd-wifibox directory and run make install PREFIX=/usr/local
  4. (Use same configuration files as 0.10.0).

Wifibox works as usual, but it still breaks on resume. I also tried using the RECOVERY_METHOD=suspend_guest and that also didn't work. Restarting the guest manually via wifibox restart guest also yielded nothing. I still see that the wifibox0 interface has an IP after resume, but going into the console shows that the wireless interface inside the linux VM is completely missing. This leads me to believe that something is still off with the fix.

My laptop is currently on stable/13-n253383-b83b87f53e73

fearedbliss commented 1 year ago

I was able to run a few more tests and got some interesting results.

So after using 0.11.0 in my earlier tests, I decided to bring back my old workaround of having a resume function in the devd, and decided to test both using stop/start guest and also using a regular stop/start (with the acpi suspend/resume notify happening first), for both of these cases I wasn't able to get the card coming back up inside the linux container. We can see these messages below.

good (initial boot, haven't suspended yet)

[    0.818744] Intel(R) Wireless WiFi driver for Linux
[    0.819087] iwlwifi 0000:00:06.0: can't derive routing for PCI INT A
[    0.819088] iwlwifi 0000:00:06.0: PCI INT A: not connected
[    0.821335] iwlwifi 0000:00:06.0: Failed to set affinity mask for IRQ 41
[    0.873280] iwlwifi 0000:00:06.0: api flags index 2 larger than supported by driver
[    0.873292] iwlwifi 0000:00:06.0: TLV_FW_FSEQ_VERSION: FSEQ Version: 0.63.2.2
[    0.873417] iwlwifi 0000:00:06.0: loaded firmware version 66.f1c864e0.0 ty-a0-gf-a0-66.ucode op_mode iwlmvm
[    0.891888] iwlwifi 0000:00:06.0: Detected Intel(R) Wi-Fi 6 AX210 160MHz, REV=0x420
[    0.903410] thermal thermal_zone0: failed to read out thermal zone (-61)
[    1.065478] iwlwifi 0000:00:06.0: loaded PNVM version 0x881c99e1
[    1.080481] iwlwifi 0000:00:06.0: Detected RF GF, rfid=0x10d000
[    1.149806] iwlwifi 0000:00:06.0: base HW address: c4:bd:e5:1b:47:02
[    1.289130] random: wpa_supplicant: uninitialized urandom read (1027 bytes read)
[    1.584877] ACPI: \: failed to evaluate _DSM (0x1001)
[    1.584881] ACPI: \: failed to evaluate _DSM (0x1001)
[    1.584882] ACPI: \: failed to evaluate _DSM (0x1001)
[    1.584883] ACPI: \: failed to evaluate _DSM (0x1001)
[    1.655641] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[    1.656030] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[    2.140181] random: crng init done
[    5.358072] wlan0: authenticate with 30:23:03:df:b8:b2
[    5.364964] wlan0: send auth to 30:23:03:df:b8:b2 (try 1/3)
[    5.395388] wlan0: authenticated
[    5.396262] wlan0: associate with 30:23:03:df:b8:b2 (try 1/3)
[    5.415443] wlan0: RX AssocResp from 30:23:03:df:b8:b2 (capab=0x11 status=0 aid=1)
[    5.430659] wlan0: associated
[    5.442430] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready

after sleep/resume (before sleep: /usr/local/sbin/wifibox stop guest, after start guest)

[    0.818297] Intel(R) Wireless WiFi driver for Linux
[    0.818762] iwlwifi 0000:00:06.0: can't derive routing for PCI INT A
[    0.818763] iwlwifi 0000:00:06.0: PCI INT A: not connected
[    0.821402] iwlwifi 0000:00:06.0: Failed to set affinity mask for IRQ 41
[    0.875204] iwlwifi 0000:00:06.0: api flags index 2 larger than supported by driver
[    0.875217] iwlwifi 0000:00:06.0: TLV_FW_FSEQ_VERSION: FSEQ Version: 0.63.2.2
[    0.875374] iwlwifi 0000:00:06.0: loaded firmware version 66.f1c864e0.0 ty-a0-gf-a0-66.ucode op_mode iwlmvm
[    0.894169] iwlwifi 0000:00:06.0: Detected Intel(R) Wi-Fi 6 AX210 160MHz, REV=0x420
[    0.905765] thermal thermal_zone0: failed to read out thermal zone (-61)
[    1.944623] iwlwifi 0000:00:06.0: SecBoot CPU1 Status: 0x0, CPU2 Status: 0x3010801
[    1.944732] iwlwifi 0000:00:06.0: UMAC PC: 0xc00c06d2
[    1.944791] iwlwifi 0000:00:06.0: LMAC PC: 0x0
[    1.944793] iwlwifi 0000:00:06.0: WRT: Collecting data: ini trigger 13 fired (delay=0ms).
[    1.945600] iwlwifi 0000:00:06.0: Start IWL Error Log Dump:
[    1.945601] iwlwifi 0000:00:06.0: Transport status: 0x00000042, valid: 466844287
[    1.945602] iwlwifi 0000:00:06.0: Loaded firmware version: 66.f1c864e0.0 ty-a0-gf-a0-66.ucode
[    1.945603] iwlwifi 0000:00:06.0: 0x57F33F7B | ADVANCED_SYSASSERT
[    1.945604] iwlwifi 0000:00:06.0: 0xAD15CBEA | trm_hw_status0
[    1.945604] iwlwifi 0000:00:06.0: 0x5EA7FBC6 | trm_hw_status1
[    1.945605] iwlwifi 0000:00:06.0: 0xB2DFD03B | branchlink2
[    1.945605] iwlwifi 0000:00:06.0: 0x87799835 | interruptlink1
[    1.945606] iwlwifi 0000:00:06.0: 0xF6EDAD9D | interruptlink2
[    1.945606] iwlwifi 0000:00:06.0: 0xE7D9FFBE | data1
[    1.945607] iwlwifi 0000:00:06.0: 0x53F189FE | data2
[    1.945607] iwlwifi 0000:00:06.0: 0xA677E6BA | data3
[    1.945607] iwlwifi 0000:00:06.0: 0xD17FD949 | beacon time
[    1.945608] iwlwifi 0000:00:06.0: 0xF77375E7 | tsf low
[    1.945608] iwlwifi 0000:00:06.0: 0xDF13259B | tsf hi
[    1.945609] iwlwifi 0000:00:06.0: 0x61ABD0E6 | time gp1
[    1.945609] iwlwifi 0000:00:06.0: 0x66CB07F2 | time gp2
[    1.945610] iwlwifi 0000:00:06.0: 0x7EFE60FE | uCode revision type
[    1.945610] iwlwifi 0000:00:06.0: 0xFDF6F3E0 | uCode version major
[    1.945611] iwlwifi 0000:00:06.0: 0x6DD62EDF | uCode version minor
[    1.945611] iwlwifi 0000:00:06.0: 0xC8330654 | hw version
[    1.945611] iwlwifi 0000:00:06.0: 0x382ABC1B | board version
[    1.945612] iwlwifi 0000:00:06.0: 0x5A06FBE8 | hcmd
[    1.945612] iwlwifi 0000:00:06.0: 0xD48241E6 | isr0
[    1.945613] iwlwifi 0000:00:06.0: 0x85F8274C | isr1
[    1.945613] iwlwifi 0000:00:06.0: 0xCD87698A | isr2
[    1.945614] iwlwifi 0000:00:06.0: 0xD22D548A | isr3
[    1.945614] iwlwifi 0000:00:06.0: 0xF166A3EB | isr4
[    1.945614] iwlwifi 0000:00:06.0: 0xAB280903 | last cmd Id
[    1.945615] iwlwifi 0000:00:06.0: 0x25BE1535 | wait_event
[    1.945615] iwlwifi 0000:00:06.0: 0xDD84C1C2 | l2p_control
[    1.945616] iwlwifi 0000:00:06.0: 0x82EC42D2 | l2p_duration
[    1.945616] iwlwifi 0000:00:06.0: 0x773A314B | l2p_mhvalid
[    1.945616] iwlwifi 0000:00:06.0: 0x935AD823 | l2p_addr_match
[    1.945617] iwlwifi 0000:00:06.0: 0x59D20723 | lmpm_pmg_sel
[    1.945617] iwlwifi 0000:00:06.0: 0x28399109 | timestamp
[    1.945618] iwlwifi 0000:00:06.0: 0xF329D2A8 | flow_handler
[    1.945837] iwlwifi 0000:00:06.0: Start IWL Error Log Dump:
[    1.945837] iwlwifi 0000:00:06.0: Transport status: 0x00000042, valid: 2136667387
[    1.945838] iwlwifi 0000:00:06.0: 0x19FCFFDF | ADVANCED_SYSASSERT
[    1.945838] iwlwifi 0000:00:06.0: 0xEE140038 | umac branchlink1
[    1.945839] iwlwifi 0000:00:06.0: 0xF4FFDDB7 | umac branchlink2
[    1.945839] iwlwifi 0000:00:06.0: 0xBB86AB48 | umac interruptlink1
[    1.945840] iwlwifi 0000:00:06.0: 0x9FFF988F | umac interruptlink2
[    1.945840] iwlwifi 0000:00:06.0: 0xF2E6D3EF | umac data1
[    1.945841] iwlwifi 0000:00:06.0: 0xD650F824 | umac data2
[    1.945841] iwlwifi 0000:00:06.0: 0xDF04FE10 | umac data3
[    1.945842] iwlwifi 0000:00:06.0: 0x7A6FFDEA | umac major
[    1.945842] iwlwifi 0000:00:06.0: 0x23B9D0B1 | umac minor
[    1.945842] iwlwifi 0000:00:06.0: 0xDE35D67E | frame pointer
[    1.945843] iwlwifi 0000:00:06.0: 0x09127045 | stack pointer
[    1.945843] iwlwifi 0000:00:06.0: 0x5AFF4F17 | last host cmd
[    1.945844] iwlwifi 0000:00:06.0: 0xA9542D71 | isr status reg
[    1.946023] iwlwifi 0000:00:06.0: IML/ROM dump:
[    1.946024] iwlwifi 0000:00:06.0: 0x0301 | IML/ROM SYSASSERT
[    1.946024] iwlwifi 0000:00:06.0: 0x03010801 | IML/ROM error/state
[    1.946083] iwlwifi 0000:00:06.0: 0x00000000 | IML/ROM data1
[    1.946142] iwlwifi 0000:00:06.0: 0x00000080 | IML/ROM WFPM_AUTH_KEY_0
[    1.946197] iwlwifi 0000:00:06.0: Fseq Registers:
[    1.946200] iwlwifi 0000:00:06.0: 0x60000100 | FSEQ_ERROR_CODE
[    1.946204] iwlwifi 0000:00:06.0: 0x80440005 | FSEQ_TOP_INIT_VERSION
[    1.946207] iwlwifi 0000:00:06.0: 0x00080009 | FSEQ_CNVIO_INIT_VERSION
[    1.946210] iwlwifi 0000:00:06.0: 0x0000A652 | FSEQ_OTP_VERSION
[    1.946213] iwlwifi 0000:00:06.0: 0x00000002 | FSEQ_TOP_CONTENT_VERSION
[    1.946216] iwlwifi 0000:00:06.0: 0x4552414E | FSEQ_ALIVE_TOKEN
[    1.946219] iwlwifi 0000:00:06.0: 0x00400410 | FSEQ_CNVI_ID
[    1.946222] iwlwifi 0000:00:06.0: 0x00400410 | FSEQ_CNVR_ID
[    1.946226] iwlwifi 0000:00:06.0: 0x00400410 | CNVI_AUX_MISC_CHIP
[    1.946231] iwlwifi 0000:00:06.0: 0x00400410 | CNVR_AUX_MISC_CHIP
[    1.946236] iwlwifi 0000:00:06.0: 0x00009061 | CNVR_SCU_SD_REGS_SD_REG_DIG_DCDC_VTRIM
[    1.946241] iwlwifi 0000:00:06.0: 0x00000061 | CNVR_SCU_SD_REGS_SD_REG_ACTIVE_VDIG_MIRROR
[    1.946244] iwlwifi 0000:00:06.0: Failed to start RT ucode: -110
[    1.946245] iwlwifi 0000:00:06.0: WRT: Collecting data: ini trigger 13 fired (delay=0ms).
[    6.110729] iwlwifi 0000:00:06.0: Failed to run INIT ucode: -110
[    6.123528] iwlwifi 0000:00:06.0: retry init count 0
[    6.123628] iwlwifi 0000:00:06.0: Detected Intel(R) Wi-Fi 6 AX210 160MHz, REV=0x420
[    6.133272] thermal thermal_zone0: failed to read out thermal zone (-61)
[    6.239914] random: wpa_supplicant: uninitialized urandom read (1027 bytes read)
[    6.271275] random: wpa_supplicant: uninitialized urandom read (1027 bytes read)
[    6.273011] random: wpa_supplicant: uninitialized urandom read (1027 bytes read)
[    7.194990] iwlwifi 0000:00:06.0: SecBoot CPU1 Status: 0x0, CPU2 Status: 0x3010801
[    7.195222] iwlwifi 0000:00:06.0: UMAC PC: 0xc00c06d2
[    7.195671] iwlwifi 0000:00:06.0: LMAC PC: 0x0
[    7.195673] iwlwifi 0000:00:06.0: WRT: Collecting data: ini trigger 13 fired (delay=0ms).
[    7.196802] iwlwifi 0000:00:06.0: Start IWL Error Log Dump:
[    7.196804] iwlwifi 0000:00:06.0: Transport status: 0x00000042, valid: 469465855
[    7.196805] iwlwifi 0000:00:06.0: Loaded firmware version: 66.f1c864e0.0 ty-a0-gf-a0-66.ucode
[    7.196807] iwlwifi 0000:00:06.0: 0x57730D7B | ADVANCED_SYSASSERT
[    7.196809] iwlwifi 0000:00:06.0: 0xAD15C3EA | trm_hw_status0
[    7.196810] iwlwifi 0000:00:06.0: 0x5EA7F9C6 | trm_hw_status1
[    7.196810] iwlwifi 0000:00:06.0: 0xF2DDD13B | branchlink2
[    7.196811] iwlwifi 0000:00:06.0: 0x877B8835 | interruptlink1
[    7.196812] iwlwifi 0000:00:06.0: 0xF6EDAD99 | interruptlink2
[    7.196813] iwlwifi 0000:00:06.0: 0xC699FDBF | data1
[    7.196814] iwlwifi 0000:00:06.0: 0x72F1C9FE | data2
[    7.196815] iwlwifi 0000:00:06.0: 0xA6736F38 | data3
[    7.196816] iwlwifi 0000:00:06.0: 0xD57FD959 | beacon time
[    7.196817] iwlwifi 0000:00:06.0: 0xF27774E7 | tsf low
[    7.196818] iwlwifi 0000:00:06.0: 0xDB1325BF | tsf hi
[    7.196819] iwlwifi 0000:00:06.0: 0x618B90E6 | time gp1
[    7.196820] iwlwifi 0000:00:06.0: 0x678B0BB2 | time gp2
[    7.196821] iwlwifi 0000:00:06.0: 0x7EFE40FE | uCode revision type
[    7.196821] iwlwifi 0000:00:06.0: 0xFDF6F3E0 | uCode version major
[    7.196822] iwlwifi 0000:00:06.0: 0x6DD62EDF | uCode version minor
[    7.196823] iwlwifi 0000:00:06.0: 0xEA331614 | hw version
[    7.196824] iwlwifi 0000:00:06.0: 0x382A3C1B | board version
[    7.196825] iwlwifi 0000:00:06.0: 0x5A06FBE8 | hcmd
[    7.196826] iwlwifi 0000:00:06.0: 0xD48241E6 | isr0
[    7.196827] iwlwifi 0000:00:06.0: 0x95F8037D | isr1
[    7.196828] iwlwifi 0000:00:06.0: 0xCC8769AE | isr2
[    7.196828] iwlwifi 0000:00:06.0: 0xD22D508C | isr3
[    7.196829] iwlwifi 0000:00:06.0: 0xF162ABEB | isr4
[    7.196830] iwlwifi 0000:00:06.0: 0xAB280123 | last cmd Id
[    7.196831] iwlwifi 0000:00:06.0: 0x25BE15BD | wait_event
[    7.196832] iwlwifi 0000:00:06.0: 0xDD86C586 | l2p_control
[    7.196833] iwlwifi 0000:00:06.0: 0x92EC42F2 | l2p_duration
[    7.196834] iwlwifi 0000:00:06.0: 0x673A3C5B | l2p_mhvalid
[    7.196835] iwlwifi 0000:00:06.0: 0x931ADC23 | l2p_addr_match
[    7.196835] iwlwifi 0000:00:06.0: 0x59F20DA3 | lmpm_pmg_sel
[    7.196836] iwlwifi 0000:00:06.0: 0x68391189 | timestamp
[    7.196837] iwlwifi 0000:00:06.0: 0xD369D629 | flow_handler
[    7.197059] iwlwifi 0000:00:06.0: Start IWL Error Log Dump:
[    7.197060] iwlwifi 0000:00:06.0: Transport status: 0x00000042, valid: 2119857403
[    7.197061] iwlwifi 0000:00:06.0: 0x59FCFFDF | ADVANCED_SYSASSERT
[    7.197062] iwlwifi 0000:00:06.0: 0xEE140038 | umac branchlink1
[    7.197063] iwlwifi 0000:00:06.0: 0xF4FFDDB7 | umac branchlink2
[    7.197064] iwlwifi 0000:00:06.0: 0xBB86ABC8 | umac interruptlink1
[    7.197065] iwlwifi 0000:00:06.0: 0x1FFD988F | umac interruptlink2
[    7.197066] iwlwifi 0000:00:06.0: 0xF2E6D3EF | umac data1
[    7.197067] iwlwifi 0000:00:06.0: 0xD670F824 | umac data2
[    7.197068] iwlwifi 0000:00:06.0: 0xDB04DE00 | umac data3
[    7.197069] iwlwifi 0000:00:06.0: 0x7AEFFDEA | umac major
[    7.197069] iwlwifi 0000:00:06.0: 0x23B9D0B1 | umac minor
[    7.197070] iwlwifi 0000:00:06.0: 0xDE35D67E | frame pointer
[    7.197071] iwlwifi 0000:00:06.0: 0x09127045 | stack pointer
[    7.197072] iwlwifi 0000:00:06.0: 0x5AFF4F17 | last host cmd
[    7.197073] iwlwifi 0000:00:06.0: 0xA9142D71 | isr status reg
[    7.197267] iwlwifi 0000:00:06.0: IML/ROM dump:
[    7.197267] iwlwifi 0000:00:06.0: 0x0301 | IML/ROM SYSASSERT
[    7.197268] iwlwifi 0000:00:06.0: 0x03010801 | IML/ROM error/state
[    7.197766] iwlwifi 0000:00:06.0: 0x00000000 | IML/ROM data1
[    7.197994] iwlwifi 0000:00:06.0: 0x00000080 | IML/ROM WFPM_AUTH_KEY_0
[    7.198126] iwlwifi 0000:00:06.0: Fseq Registers:
[    7.198176] iwlwifi 0000:00:06.0: 0x60000100 | FSEQ_ERROR_CODE
[    7.198227] iwlwifi 0000:00:06.0: 0x80440005 | FSEQ_TOP_INIT_VERSION
[    7.198274] iwlwifi 0000:00:06.0: 0x00080009 | FSEQ_CNVIO_INIT_VERSION
[    7.198331] iwlwifi 0000:00:06.0: 0x0000A652 | FSEQ_OTP_VERSION
[    7.198502] iwlwifi 0000:00:06.0: 0x00000002 | FSEQ_TOP_CONTENT_VERSION
[    7.198647] iwlwifi 0000:00:06.0: 0x4552414E | FSEQ_ALIVE_TOKEN
[    7.198789] iwlwifi 0000:00:06.0: 0x00400410 | FSEQ_CNVI_ID
[    7.198840] iwlwifi 0000:00:06.0: 0x00400410 | FSEQ_CNVR_ID
[    7.198886] iwlwifi 0000:00:06.0: 0x00400410 | CNVI_AUX_MISC_CHIP
[    7.198934] iwlwifi 0000:00:06.0: 0x00400410 | CNVR_AUX_MISC_CHIP
[    7.198980] iwlwifi 0000:00:06.0: 0x00009061 | CNVR_SCU_SD_REGS_SD_REG_DIG_DCDC_VTRIM
[    7.199027] iwlwifi 0000:00:06.0: 0x00000061 | CNVR_SCU_SD_REGS_SD_REG_ACTIVE_VDIG_MIRROR
[    7.199074] iwlwifi 0000:00:06.0: Failed to start RT ucode: -110
[    7.199076] iwlwifi 0000:00:06.0: WRT: Collecting data: ini trigger 13 fired (delay=0ms).
[    9.295714] iwlwifi 0000:00:06.0: Failed to run INIT ucode: -110
[    9.309464] iwlwifi 0000:00:06.0: retry init count 1
[    9.309574] iwlwifi 0000:00:06.0: Detected Intel(R) Wi-Fi 6 AX210 160MHz, REV=0x420
[    9.319305] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[    9.320479] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[    9.321281] thermal thermal_zone0: failed to read out thermal zone (-61)
[   10.328988] iwlwifi 0000:00:06.0: SecBoot CPU1 Status: 0x0, CPU2 Status: 0x3010801
[   10.329227] iwlwifi 0000:00:06.0: UMAC PC: 0xc00c06d2
[   10.329686] iwlwifi 0000:00:06.0: LMAC PC: 0x0
[   10.329688] iwlwifi 0000:00:06.0: WRT: Collecting data: ini trigger 13 fired (delay=0ms).
[   10.330783] iwlwifi 0000:00:06.0: Start IWL Error Log Dump:
[   10.330784] iwlwifi 0000:00:06.0: Transport status: 0x00000042, valid: 469445247
[   10.330787] iwlwifi 0000:00:06.0: Loaded firmware version: 66.f1c864e0.0 ty-a0-gf-a0-66.ucode
[   10.330788] iwlwifi 0000:00:06.0: 0x57F3AF7B | ADVANCED_SYSASSERT
[   10.330790] iwlwifi 0000:00:06.0: 0xAD15CBEA | trm_hw_status0
[   10.330791] iwlwifi 0000:00:06.0: 0x5EA7F9C6 | trm_hw_status1
[   10.330792] iwlwifi 0000:00:06.0: 0xB2DFD43F | branchlink2
[   10.330793] iwlwifi 0000:00:06.0: 0x17799834 | interruptlink1
[   10.330794] iwlwifi 0000:00:06.0: 0x7CEDAD99 | interruptlink2
[   10.330795] iwlwifi 0000:00:06.0: 0xE799FDBE | data1
[   10.330796] iwlwifi 0000:00:06.0: 0x53F1C8FE | data2
[   10.330796] iwlwifi 0000:00:06.0: 0xA673EE38 | data3
[   10.330797] iwlwifi 0000:00:06.0: 0x517FDD49 | beacon time
[   10.330798] iwlwifi 0000:00:06.0: 0xF2F777E7 | tsf low
[   10.330799] iwlwifi 0000:00:06.0: 0xDF132597 | tsf hi
[   10.330800] iwlwifi 0000:00:06.0: 0x618B90E6 | time gp1
[   10.330801] iwlwifi 0000:00:06.0: 0x66CB07B2 | time gp2
[   10.330802] iwlwifi 0000:00:06.0: 0x7EFE40F6 | uCode revision type
[   10.330803] iwlwifi 0000:00:06.0: 0xFDF6F3A0 | uCode version major
[   10.330804] iwlwifi 0000:00:06.0: 0x6DD62EDE | uCode version minor
[   10.330805] iwlwifi 0000:00:06.0: 0xC8330654 | hw version
[   10.330806] iwlwifi 0000:00:06.0: 0x382A3E1B | board version
[   10.330807] iwlwifi 0000:00:06.0: 0x5A06FBF8 | hcmd
[   10.330808] iwlwifi 0000:00:06.0: 0xD48241E6 | isr0
[   10.330809] iwlwifi 0000:00:06.0: 0x97F8235D | isr1
[   10.330809] iwlwifi 0000:00:06.0: 0xCCC3E98A | isr2
[   10.330810] iwlwifi 0000:00:06.0: 0xD22D5088 | isr3
[   10.330811] iwlwifi 0000:00:06.0: 0xB16282EB | isr4
[   10.330812] iwlwifi 0000:00:06.0: 0xAB280103 | last cmd Id
[   10.330813] iwlwifi 0000:00:06.0: 0x2DFA15FD | wait_event
[   10.330814] iwlwifi 0000:00:06.0: 0xDD86C4C2 | l2p_control
[   10.330815] iwlwifi 0000:00:06.0: 0x80E842D2 | l2p_duration
[   10.330816] iwlwifi 0000:00:06.0: 0x677A305B | l2p_mhvalid
[   10.330817] iwlwifi 0000:00:06.0: 0x935AFC23 | l2p_addr_match
[   10.330818] iwlwifi 0000:00:06.0: 0x59D20F23 | lmpm_pmg_sel
[   10.330818] iwlwifi 0000:00:06.0: 0x6C39B109 | timestamp
[   10.330819] iwlwifi 0000:00:06.0: 0xD328D629 | flow_handler
[   10.331039] iwlwifi 0000:00:06.0: Start IWL Error Log Dump:
[   10.331040] iwlwifi 0000:00:06.0: Transport status: 0x00000042, valid: 2054878459
[   10.331042] iwlwifi 0000:00:06.0: 0x59FCFFDF | ADVANCED_SYSASSERT
[   10.331043] iwlwifi 0000:00:06.0: 0xE614003A | umac branchlink1
[   10.331044] iwlwifi 0000:00:06.0: 0xF4FFDDB7 | umac branchlink2
[   10.331045] iwlwifi 0000:00:06.0: 0xBB86AB48 | umac interruptlink1
[   10.331046] iwlwifi 0000:00:06.0: 0x1FFF988F | umac interruptlink2
[   10.331047] iwlwifi 0000:00:06.0: 0xF2E6D3EF | umac data1
[   10.331048] iwlwifi 0000:00:06.0: 0xDE70F824 | umac data2
[   10.331049] iwlwifi 0000:00:06.0: 0xDB04DE00 | umac data3
[   10.331050] iwlwifi 0000:00:06.0: 0x7AEFFDE2 | umac major
[   10.331051] iwlwifi 0000:00:06.0: 0x23B9D0B1 | umac minor
[   10.331051] iwlwifi 0000:00:06.0: 0xDE35D67E | frame pointer
[   10.331052] iwlwifi 0000:00:06.0: 0x09127045 | stack pointer
[   10.331054] iwlwifi 0000:00:06.0: 0x5AFD4F17 | last host cmd
[   10.331055] iwlwifi 0000:00:06.0: 0x89142F71 | isr status reg
[   10.331248] iwlwifi 0000:00:06.0: IML/ROM dump:
[   10.331249] iwlwifi 0000:00:06.0: 0x0301 | IML/ROM SYSASSERT
[   10.331250] iwlwifi 0000:00:06.0: 0x03010801 | IML/ROM error/state
[   10.331753] iwlwifi 0000:00:06.0: 0x00000000 | IML/ROM data1
[   10.331987] iwlwifi 0000:00:06.0: 0x00000080 | IML/ROM WFPM_AUTH_KEY_0
[   10.332124] iwlwifi 0000:00:06.0: Fseq Registers:
[   10.332175] iwlwifi 0000:00:06.0: 0x60000100 | FSEQ_ERROR_CODE
[   10.332227] iwlwifi 0000:00:06.0: 0x80440005 | FSEQ_TOP_INIT_VERSION
[   10.332278] iwlwifi 0000:00:06.0: 0x00080009 | FSEQ_CNVIO_INIT_VERSION
[   10.332335] iwlwifi 0000:00:06.0: 0x0000A652 | FSEQ_OTP_VERSION
[   10.331051] iwlwifi 0000:00:06.0: 0x23B9D0B1 | umac minor
[   10.331051] iwlwifi 0000:00:06.0: 0xDE35D67E | frame pointer
[   10.331052] iwlwifi 0000:00:06.0: 0x09127045 | stack pointer
[   10.331054] iwlwifi 0000:00:06.0: 0x5AFD4F17 | last host cmd
[   10.331055] iwlwifi 0000:00:06.0: 0x89142F71 | isr status reg
[   10.331248] iwlwifi 0000:00:06.0: IML/ROM dump:
[   10.331249] iwlwifi 0000:00:06.0: 0x0301 | IML/ROM SYSASSERT
[   10.331250] iwlwifi 0000:00:06.0: 0x03010801 | IML/ROM error/state
[   10.331753] iwlwifi 0000:00:06.0: 0x00000000 | IML/ROM data1
[   10.331987] iwlwifi 0000:00:06.0: 0x00000080 | IML/ROM WFPM_AUTH_KEY_0
[   10.332124] iwlwifi 0000:00:06.0: Fseq Registers:
[   10.332175] iwlwifi 0000:00:06.0: 0x60000100 | FSEQ_ERROR_CODE
[   10.332227] iwlwifi 0000:00:06.0: 0x80440005 | FSEQ_TOP_INIT_VERSION
[   10.332278] iwlwifi 0000:00:06.0: 0x00080009 | FSEQ_CNVIO_INIT_VERSION
[   10.332335] iwlwifi 0000:00:06.0: 0x0000A652 | FSEQ_OTP_VERSION
[   10.332511] iwlwifi 0000:00:06.0: 0x00000002 | FSEQ_TOP_CONTENT_VERSION
[   10.332652] iwlwifi 0000:00:06.0: 0x4552414E | FSEQ_ALIVE_TOKEN
[   10.332703] iwlwifi 0000:00:06.0: 0x00400410 | FSEQ_CNVI_ID
[   10.332754] iwlwifi 0000:00:06.0: 0x00400410 | FSEQ_CNVR_ID
[   10.332801] iwlwifi 0000:00:06.0: 0x00400410 | CNVI_AUX_MISC_CHIP
[   10.332855] iwlwifi 0000:00:06.0: 0x00400410 | CNVR_AUX_MISC_CHIP
[   10.332901] iwlwifi 0000:00:06.0: 0x00009061 | CNVR_SCU_SD_REGS_SD_REG_DIG_DCDC_VTRIM
[   10.332955] iwlwifi 0000:00:06.0: 0x00000061 | CNVR_SCU_SD_REGS_SD_REG_ACTIVE_VDIG_MIRROR
[   10.333007] iwlwifi 0000:00:06.0: Failed to start RT ucode: -110
[   10.333008] iwlwifi 0000:00:06.0: WRT: Collecting data: ini trigger 13 fired (delay=0ms).
[   12.425742] iwlwifi 0000:00:06.0: Failed to run INIT ucode: -110
[   12.439826] iwlwifi 0000:00:06.0: retry init count 2
[   12.486334] random: crng init done
[   12.486335] random: 8 urandom warning(s) missed due to ratelimiting

and also (in another test run):

[    0.806339] Intel(R) Wireless WiFi driver for Linux
[    0.806410] iwlwifi 0000:00:06.0: can't derive routing for PCI INT A
[    0.806412] iwlwifi 0000:00:06.0: PCI INT A: not connected
[    0.807037] iwlwifi 0000:00:06.0: Failed to set affinity mask for IRQ 41
[    0.859023] iwlwifi 0000:00:06.0: api flags index 2 larger than supported by driver
[    0.859034] iwlwifi 0000:00:06.0: TLV_FW_FSEQ_VERSION: FSEQ Version: 0.63.2.2
[    0.859153] iwlwifi 0000:00:06.0: loaded firmware version 66.f1c864e0.0 ty-a0-gf-a0-66.ucode op_mode iwlmvm
[    0.877460] iwlwifi 0000:00:06.0: Detected Intel(R) Wi-Fi 6 AX210 160MHz, REV=0xA5AC
[    1.056867] random: wpa_supplicant: uninitialized urandom read (1027 bytes read)
[    1.089859] random: wpa_supplicant: uninitialized urandom read (1027 bytes read)
[    1.091377] random: wpa_supplicant: uninitialized urandom read (1027 bytes read)
[    1.114266] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[    1.114674] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[    1.160116] random: crng init done
[    1.160117] random: 8 urandom warning(s) missed due to ratelimiting
[    2.127218] iwlwifi 0000:00:06.0: Couldn't prepare the card
[    2.127222] iwlwifi 0000:00:06.0: Error while preparing HW: -110
[    2.139824] iwlwifi 0000:00:06.0: retry init count 0
[    2.139841] iwlwifi 0000:00:06.0: Detected Intel(R) Wi-Fi 6 AX210 160MHz, REV=0xA5AC
[    3.389117] iwlwifi 0000:00:06.0: Couldn't prepare the card
[    3.389122] iwlwifi 0000:00:06.0: Error while preparing HW: -110
[    3.401758] iwlwifi 0000:00:06.0: retry init count 1
[    3.401774] iwlwifi 0000:00:06.0: Detected Intel(R) Wi-Fi 6 AX210 160MHz, REV=0xA5AC
[    4.653944] iwlwifi 0000:00:06.0: Couldn't prepare the card
[    4.653948] iwlwifi 0000:00:06.0: Error while preparing HW: -110
[    4.666576] iwlwifi 0000:00:06.0: retry init count 2

attempt 1: adding resume hook back into devd, and stop/restarting guest while keeping the acpi notifys happening first (FAILED)

notify 11 {
        match "system"          "ACPI";
        match "subsystem"       "Suspend";
        action "/etc/rc.suspend acpi $notify && logger 'stopping wifibox guest ...' && /usr/local/sbin/wifibox stop guest";
};

notify 11 {
        match "system"          "ACPI";
        match "subsystem"       "Resume";
        action "/etc/rc.resume acpi $notify && logger 'starting wifibox guest ...' && /usr/local/sbin/wifibox start guest";
};

We could properly see the logging is working as for the above:

11952 Jan 13 08:16:09 leslie root[2261]: stopping wifibox guest ...
12013 Jan 13 08:16:13 leslie root[2351]: starting wifibox guest ...

attempt 2: starting/stopping the entire thing and keeping acpi notifys first (FAILED)

notify 11 {
        match "system"          "ACPI";
        match "subsystem"       "Suspend";
        action "/etc/rc.suspend acpi $notify && logger 'stopping wifibox ...' && /usr/local/sbin/wifibox stop";
};

notify 11 {
        match "system"          "ACPI";
        match "subsystem"       "Resume";
        action "/etc/rc.resume acpi $notify && logger 'starting wifibox ...' && /usr/local/sbin/wifibox start && dhclient wifibox0";
};
12846 Jan 13 08:36:42 leslie root[2246]: stopping wifibox ...
12915 Jan 13 08:36:46 leslie root[2349]: starting wifibox ...

attempt 3: use my original solution of stopping entire thing first and then sending the acpi suspend notify, and for resume, send resume notify first, and then start up the system / get IP (FAILED on 0.11.0):

notify 11 {
        match "system"          "ACPI";
        match "subsystem"       "Suspend";
        action "logger 'stopping wifibox ...' && /usr/local/sbin/wifibox stop && /etc/rc.suspend acpi $notify";
};

notify 11 {
        match "system"          "ACPI";
        match "subsystem"       "Resume";
        action "/etc/rc.resume acpi $notify && logger 'starting wifibox ...' && /usr/local/sbin/wifibox start && dhclient wifibox0";

attempt 4: try the same as above, but reverted back to 0.10.0 (SUCCESS).

Due to the above work around working in 0.10.0 but not working in 0.11.0, and it also working on this latest stable 13, means that there is no regression at the FreeBSD level, but it seems there is a regression in wifibox-core. I believe maybe the ordering of the changes implemented in https://github.com/pgj/freebsd-wifibox/commit/b6824c53bb5faf1bba6db97c5ab9a0b5710b45f9 has something to do with it. However, I haven't had time to bisect that since I have a flight to catch in a few hours. I can help continue debugging this once I'm back from my trip. With that said, I feel there may still be a path forward with using start/stop guest but we'll probably need a combination of adjusting the suspend devd hook so that the guest is stopped before the suspend notification occurs, and have a resume hook that does the acpi resume notify first, and then starts the the guest afterwards (essentially the suspend/resume stop/start operations are a stack).

pgj commented 1 year ago

Hi @fearedbliss, thanks for the detailed report. Note that you can always get the latest version of wifibox-core from the pgj/freebsd-wifibox-port GitHub repository, there is no need for the manual install.

Some of the things have changed since the referenced commit -- per #39, the resume action is handled automatically by rc(8) and the devd.conf contains only suspend, but all it does is to call wifibox suspend (because it is not covered by rc(8)). Nevertheless, if I understood this well, devd.conf needs to be changed to call wifibox suspend first and then go on with /etc/rc.suspend?

fearedbliss commented 1 year ago

Hey @pgj,

I installed the one that's in the normal ports tree. Are there instructions for how to easily install a port outside of the ports tree?

Re: the ordering of the notify suspend/resume, yup i took a look at the other commits. The order matters but given the changes done, my workaround no longer works. So there is a regression.

EDIT: I did see the instructions for installing via make, I use poudriere though so that's part of what I was referencing before ;).

fearedbliss commented 1 year ago

I ran a few more experiments making just the wifibox suspend go before the suspend notify. If I do this, the system is able to survive a few suspend/resumes but it is non deterministic. Usually within 2-5 resumes it will not be able to bring up the interface anymore. Same applies if I do the full stop and start of wifibox. I started looking a little bit at the latest code to see if I can narrow it down. At the moment I'm looking at the differences between the G and GN flags and also what was happening in the 0.10.0 release.

fearedbliss commented 1 year ago

@pgj Alright! After spending a few hours bisecting this, I was able to find out the commit that broke it. So I was thinking that it was either a problem with the wifibox-alpine or wifibox-core ports between either core 0.10.0 and 0.11.0. So testing 0.10.0 on both the latest alpine and the alpine from back then, yielded a working system. I actually disabled any rc.d and devd stuff and did the following:

1. Turn on computer
2. wifibox start && dhclient wifibox0 && ping google.com
3. wifibox stop && zzz

After resume

4. wifibox start && dhclient wifibox0 && ping google.com

If the system can ping, then we are good. If it can't ping, if you do wifibox console and then do a ifconfig , the wlan0 will be missing.

After reverting the VMM_KO commit, it worked again. I didn't even need to suspend/resume notify or any devd and it seems to have worked. I'll need to test more but yea.

I first did a bisect of the freebsd-wifibox-port repo between:

good:
wifibox-core: 0.10.0 (35bf8d45)
wifibox-alpine: 2023-01-05 (d4afb71)

bad:
wifibox-core: 0.11.0 (d4afb71)
wifibox-alpine: 2023-01-05 (d4afb71)

Then once I was able to narrow it down to commit: 86b1eaa (0.11.0) which pointed to freebsd-wifibox core image: f62fe5f8c. After that, I bisected freebsd-wifibox between:

good: 5d6feecd
bad: f72fe5f8c

and found the first commit to break it at f7ac871e5e965.

fearedbliss commented 1 year ago

I was able to get the wlan0 situation becoming missing again without the devd or using the current devd. Making the service wifibox suspend occur before the ACPI suspend notify (same as my original workaround) allowed it to work again. The resume ACPI event isn't needed anymore. Stopping/starting the guest is enough as listed with the guest suspend recovery method in the rc.d:

suspend_cmd="${command} stop guest"
resume_cmd="${command} start guest"

I updated my previous PR to include both the reversion of the VMM_KO default path, + the swapping off the devd service suspend and acpi suspend notify. Hopefully that is all for now, let me know what you think @pgj .

pgj commented 1 year ago

Just out of curiosity: Do you use sysutils/bhyve+?

pgj commented 1 year ago

The resume ACPI event is not needed because it is handled automatically by the base system. Unfortunately, the same does not happen for suspend hence the need for the extra devd.conf. See #39.

fearedbliss commented 1 year ago

There's no resume event on my commit and also I did also mention this in my update:

"The resume ACPI event isn't needed anymore. Stopping/starting the guest is enough as listed with the guest suspend recovery method in the rc.d:"

I don't use bhyve+.

pgj commented 1 year ago

The invocation of resume is done automatically for every configured rc service on the resume ACPI event, given that they implement the resume command. That is, if you have wifibox enabled in rc.conf it will be considered by /etc/rc.resume. But that is why the resume can be configured to translate to wifibox start.

fearedbliss commented 1 year ago

Thanks for the explanation @pgj , i was actually wondering how the resume hooked back to the resume_cmd.

I'm wondering if this VMM issue is ultimately all related to the same bhyve bug we detected earlier, or if it's a new bug related to either FreeBSD's VMM module or to the rc system. Unfortunately, we have gotten 0 responses since I first opened it in August 2022.

pgj commented 1 year ago

Could you please try the following:

q-pa commented 1 year ago

Hi, I just bought an X1 Carbon Gen 7 and managed to configure wifibox. Only thing left was the suspend / resume issue. The only setting that works for me is the following one, based on cthe preceding omment h(ttps://github.com/pgj/freebsd-wifibox/issues/31#issuecomment-1386305573):

(/usr/local/etc/devd/wifibox.conf)

notify 11 {
        match "system" "ACPI";
        match "subsystem" "Suspend";
        action "logger 'Stopping wifibox before suspend' && /usr/local/sbin/wifibox stop guest && /etc/rc.suspend acpi $notify";
};

notify 11 {
        match "system" "ACPI";
        match "subsystem" "Resume";
        action "/etc/rc.resume acpi $notify && logger 'Starting wifibox after resume and getting IP via DHCP' && /sbin/kldunload vmm && /sbin/kldload vmm && /usr/local/sbin/wifibox start guest && /sbin/dhclient wifibox0";
};
krolingo commented 1 year ago

Hi, I just bought an X1 Carbon Gen 7 and managed to configure wifibox. Only thing left was the suspend / resume issue. The only setting that works for me is the following one, based on cthe preceding omment h(ttps://github.com//issues/31#issuecomment-1386305573):

(/usr/local/etc/devd/wifibox.conf)

notify 11 {
      match "system" "ACPI";
      match "subsystem" "Suspend";
      action "logger 'Stopping wifibox before suspend' && /usr/local/sbin/wifibox stop guest && /etc/rc.suspend acpi $notify";
};

notify 11 {
      match "system" "ACPI";
      match "subsystem" "Resume";
      action "/etc/rc.resume acpi $notify && logger 'Starting wifibox after resume and getting IP via DHCP' && /sbin/kldunload vmm && /sbin/kldload vmm && /usr/local/sbin/wifibox start guest && /sbin/dhclient wifibox0";
};

Thanks to your devd edits, my ghostbsd thinkpad is reconnecting at waking from sleep.

alex0x08 commented 1 year ago

Hi, can confirm the same issue. Got it fixed finally by adding sleep between steps: cat /opt/own/bin/pre-suspend

#!/bin/sh
wifibox stop
sleep 3
kldunload vmm
sleep 3

cat /opt/own/bin/post-resume

#!/bin/sh
kldload vmm
sleep 5
wifibox start
sleep 3
/sbin/dhclient wifibox0

These scripts being callled from:

notify 11 {
        match "system"          "ACPI";
        match "subsystem"       "Suspend";
        action "logger 'Stopping wifibox before suspend' && /opt/own/bin/pre-suspend && /etc/rc.suspend acpi $notify";
};

notify 11 {
        match "system"          "ACPI";
        match "subsystem"       "Resume";
        action "/etc/rc.resume acpi $notify && logger 'Starting wifibox after resume and getting IP via DHCP' && /opt/own/bin/post-resume";
};

So now it works on my Lenovo laptop, with FreeBSD 13.2-RELEASE-p2 and wifibox:

wifibox version 0.11.0
Disk image checksum: 5f5057990c704951c6e0aec071a497a64bbf4c672783ca807059f75e8b068af1

Hope this help.

pgj commented 1 year ago

For your information, a similar issue has been reported in #59 which I have just closed. The outcome of that ticket was the addition of the SUSPEND_VMM recovery method, which implements the exactly same logic as you described in the comments above.

Please test it -- you can do that by installing the latest net/wifibox-core snapshot, version 0.12.0 from the pgj/freebsd-wifibox-port repository.

alex0x08 commented 1 year ago

Hi, just tested dev build as you described, issue looks fixed now. At least I was able to make multiple suspend/resume cycles without kernel panics. Thanks.

pgj commented 1 year ago

All right, thanks for the confirmation! That said, I am closing this ticket now. But please create another one if it still does not work for some reason.