Closed sk-spr closed 1 week ago
The difference between the wifibox-alpine
and wifibox-alpine-mediatek
guests is not that big: the latter image is slimmer because it contains the firmware files for the Mediatek cards only. That is basically a flavor of the former.
There must be some problem with starting up the mt7921e
driver because the wlan0
device could not be created on the guest. A reason for the guest being unresponsive when attached by wifibox console
might be that it freezes on boot. However, according to the logs, the guest can somehow boot up. Another possible reason might be that the nmdm
driver does not properly and the communication itself becomes stuck.
It might be curious to try the following. In one of the terminals, start the wifibox console
command and leave it running. Then, in a different terminal, use the following command to restart the guest:
# wifibox restart guest
You should see the boot messages of Linux kernel on the terminal where the console is running.
Also, cannot you see any odd message in the logs of the host?
The difference between the
wifibox-alpine
andwifibox-alpine-mediatek
guests is not that big: the latter image is slimmer because it contains the firmware files for the Mediatek cards only. That is basically a flavor of the former.
Thank you, this was not entirely clear to me but now I know :)
It might be curious to try the following. In one of the terminals, start the
wifibox console
command and leave it running. Then, in a different terminal, use the following command to restart the guest:# wifibox restart guest
You should see the boot messages of Linux kernel on the terminal where the console is running.
Indeed I can, and it even lets me log in apparently. Relevant information from the boot messages might include the following:
* Could not find a wireless interface
* /var/run/wpa_supplicant: creating directory
* Starting WPA Supplicant ... [ ok ]
* Starting networking ... * lo ... [ ok ]
* eth0 ... [ ok ]
* wlan0 ...ip: SIOCGIFFLAGS: No such device
dhcpcd-10.0.6 starting
DUID 00:01:00:01:2d:d2:8e:06:00:a0:98:8a:05:71
wlan0: interface not found
dhcpcd exited
Running lspci -k
on the guest appears to show that the NIC is passed through and picked up by mt7921e:
wifibox:~# lspci -k
00:1f.0 Class 0601: 8086:7000
00:04.2 Class 0100: 1af4:1009 virtio-pci
00:04.0 Class 0100: 1af4:1001 virtio-pci
00:00.0 Class 0600: 1275:1275
00:04.3 Class 0100: 1af4:1009 virtio-pci
00:06.0 Class 0280: 14c3:7961 mt7921e
00:04.1 Class 0100: 1af4:1009 virtio-pci
00:05.0 Class 0200: 8086:100f e1000
however, the interfacewlan0
is not up as already indicated by the boot log, confirmed by ifconfig:
wifibox:~# ifconfig wlan0
wlan0 Link encap:Ethernet HWaddr 90:E8:68:43:F1:69
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
Also, cannot you see any odd message in the logs of the host? Yes, this
dmesg
line seems odd, since the output frompciconf -lv
indicates the card is at pci0:2:0:0, though I might be off here:pci2: <network> at device 0.0 (no driver attached)
This mirrors another odd warning when restarting wifibox:
Restarting wifibox....WARNING: PPT device pci2:0:0 could not be destroyed. .....OK
Here again, shouldn't the device be pci0:2:0:0? The manpages indicate that my passthru notation (see above) is correct for this device...
Thanks for the inputs. If you can see the card in the lspci -k
output with the driver attached on the guest, it means that you did everything right on your side.
The no driver attached
message in the host's dmesg
is completely normal. However, the could not be destroyed
message in the Wifibox log is a bit worrisome. It is worth to know that the relationship of bhyve
with Mediatek cards can be flaky, see https://github.com/pgj/freebsd-wifibox/issues/73 for the details.
I am a bit confused about the ifconfig
output on the guest. So you say that there is the wlan0
device but it is not in the "up" state? You can try to initiate the reconfiguration of the network interface on wifibox console
with the following command:
wifibox# service networking restart
It would be interesting to see what wpa_supplicant
actually complains about. You can get more logs from it by stopping the related the service on the guest:
wifibox# service wpa_supplicant stop
and then start WPA Supplicant in the foreground (to watch the logs):
wifibox# wpa_supplicant -iwlan0 -dd -c/etc/wpa_supplicant/wpa_supplicant.conf
This might also be caused by some regression in the Linux kernel. If you switch to the net/wifibox-alpine
port, you can optionally select Linux 6.8 (as "edge") in the options for building the guest image and test with that if the driver (and the related networking stack) of that version works better for you.
Besides all these, you can experiment with pre-configuring the ppt
(PCI pass-through) driver for the device and boot the host that way. This requires the loader.conf
to have the following line:
pptdevs="2/0/0"
I have some interesting news based on your comment: When manually restarting networking inside the guest (after getting into the console by means of wifibox console
and wifibox restart guest
) using service networking restart
after restarting wpa_supplicant (whose logs show nothing abnormal), wlan0 no longer times out waiting for a carrier (see below)!
Originally, this was the (relevant section of the) log from service networking restart
:
* Starting networking ...
* lo ...
[ ok ]
* eth0 ...
[ ok ]
* wlan0 ...
dhcpcd-10.0.6 starting
wlan0: waiting for carrier
timed out
dhcpcd exited
ifup: failed to change interface wlan0 state to 'up'
however, after running service wpa_supplicant restart
, and then service networking restart
:
wifibox:~# service wpa_supplicant restart
* Starting WPA Supplicant ...
[ ok ]
wifibox:~# service networking restart
* Starting networking ...
* lo ...
[ ok ]
* eth0 ...
[ ok ]
* wlan0 ...
dhcpcd-10.0.6 starting
DUID 00:01:00:01:2d:d3:a4:e4:90:e8:68:43:f1:69
wlan0: connected to Access Point: TP-Link_22F4
wlan0: IAID 68:43:f1:69
wlan0: adding address fe80::92e8:68ff:fe43:f169
wlan0: rebinding lease of 10.0.0.16
wlan0: soliciting an IPv6 router
wlan0: Router Advertisement from fe80::1
wlan0: requesting DHCPv6 information
wlan0: REPLY6 received from fe80::1
wlan0: refresh in 86400 seconds
wlan0: probing address 10.0.0.16/24
wlan0: leased 10.0.0.16 for 86400 seconds
wlan0: adding route to 10.0.0.0/24
wlan0: adding default route via 10.0.0.138
A quick check on the host confirms that the network is fully online:
freebsdbook% ping google.com
PING google.com (142.250.180.206): 56 data bytes
64 bytes from 142.250.180.206: icmp_seq=0 ttl=114 time=43.840 ms
64 bytes from 142.250.180.206: icmp_seq=1 ttl=114 time=54.271 ms
^C
--- google.com ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 43.840/49.055/54.271/5.215 ms
freebsdbox% ifconfig
[...]
wifibox0: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
options=0
ether 58:9c:fc:10:eb:66
inet 10.0.0.2 netmask 0xffffff00 broadcast 10.0.0.255
id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
member: tap0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
ifmaxaddr 0 port 3 priority 128 path cost 2000000
groups: bridge
nd6 options=9<PERFORMNUD,IFDISABLED>
tap0: flags=1008943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
options=80000<LINKSTATE>
ether 58:9c:fc:10:13:38
groups: tap
media: Ethernet 1000baseT <full-duplex>
status: active
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
Opened by PID 4127
Very interesting! I would be curious if this behaviour affects other configurations as well; if it does, perhaps a change in the guest's service init order might work, though I'd have to try this (and am not very familiar with openrc configuration, unfortunately...)
There is a slight chance the issue shows up for others as well. However, I am not sure if the problem is with the init order of the guest. The wpa_supplicant
service declares that it has to be started before networking
(i.e. provider of net
).
What I can currently imagine is that during the boot process, WPA Supplicant fails to find the wlan0
device, it cannot start up, and therefore the networking
service fails to work as well. But if wpa_supplicant
is restarted at some later point, everything starts to work. Maybe the mt7921e
driver takes a long time to initialize? What are the timestamps and messages for that in the dmesg
output? Could you please add the output of the following command:
wifibox# dmesg | fgrep mt7921e
By the way, I noticed in the logs that your wireless network uses the 10.0.0.0/24
, which is the same as what Wifibox uses by default to communicate with the host. This all appears to be working, but maybe it is worth to tell Wifibox to use a different subnet (e.g. 10.0.1.0/24
to avoid any potential problems caused by this overlap.
Greetings, I can verify the same behavior here with the same Mediatek card, wifibox 0.13.0.
Enabling the console and launching manually (with some delay, I guess) wpa_supplicant, then dhcpcd (interestingly, neither udhcpc nor udhcpc6 work, though BusyBox dhcpcd seems to invoke udhcpc), works around the behavior. Maybe a delayed retry (as in https://github.com/pgj/freebsd-wifibox-port/commit/954eff34efe6dd970470612602369dbdedbec096) would address the problem?
Alternatively, is there a mechanism through which one can specify a set of commands to be launched on the guest start (and possibly even on suspend/resume)? Since the guest image is read-only, I can’t modify the guest’s rc.conf or .bashrc or anything. The two ways I could see this implemented would be either as a squashfs overlay (think Puppy Linux or any LiveUSB with additional persistent storage), or more simply a script passed through a bhyve arg or variable (like with some Docker images), much like the passthru option setting. But I may be reinventing the wheel here...
Hi @vvillenave, thanks for the confirmation. The delayed retry (of WPA Supplicant) is already part of the development version of the port (as you pointed out above) so you or @sk-spr can try that.
There is no explicit support for specifying commands on start/stop/suspend/resume for the guest. First of all, the guest does not deal with suspend/resume, that is always translated to start/stop (host suspend: guest stop or no-op, host resume: guest start or restart).
Second, even though the guest image is read-only, you can modify how it is constructed in the net/wifibox-alpine
port. For example, the pre-build
target in the port's Makefile
can be used to add extra files for the guest's root file system (which is sourced from _GUESTDIR
during the build process). But, since the whole construction is based on the use of Alpine Linux packages for modularity, it is often better to create your own package with the planned modifications and add the package to the list of packages to use (_PACKAGES
variable). Even further, the boot/shutdown process could be influenced by creating OpenRC scripts and hooking them up (e.g. _BOOT_SERVICES
variable). An example of that is the Wifibox-custom uds_passthru
package.
But in cases of problems like this one, it is better to find a general solution. That is, have it reported here and create a solution together that works for everybody. For example, the delayed retry I implemented for WPA Supplicant was designed in a way to possibly cover other scenarios, and with some luck, it can even provide a solution for this issue as well.
@sk-spr @vvillenave please try the latest development version of Wifibox, mostly notably the 20240821
(or later) version of the net/wifibox-alpine
port from here:
https://github.com/pgj/freebsd-wifibox-port
I am inclined to assume that the exponential back-off for WPA supplicant (https://github.com/pgj/freebsd-wifibox-alpine/pull/25) solves the issue though. Please check it and let me know in the next two weeks. Then I am going to close this ticket as resolved unless I hear back otherwise.
@sk-spr @vvillenave please try the latest development version of Wifibox, mostly notably the
20240821
(or later) version of thenet/wifibox-alpine
port from here:https://github.com/pgj/freebsd-wifibox-port
I am inclined to assume that the exponential back-off for WPA supplicant (https://github.com/pgj/freebsd-wifibox-alpine/pull/25) solves the issue though. Please check it and let me know in the next two weeks. Then I am going to close this ticket as resolved unless I hear back otherwise.
I will try and find the install, though I suspect it is on an external SSD that I may have wiped. I will let you know if it works on the case I still have the install around.
@pgj After reinstalling FreeBSD because I had indeed wiped the drive with the previous install, and setting up wifibox again, it appears to be fully working (!)
@vvillenave if it's working for you as well, I'd say this issue is ready to be closed 🤩
All right folks, based on the replies I have got so far, I assume that this issue has been fixed. Remember though that you can always come back and create a new ticket if the problem still persists in some form.
@pgj I can confirm that the updated Alpine image works out-of-the-box. Thanks very much!
Description
A clear and concise description of what the problem is. What is expected from the application?
wifibox console
produces no output other than the usualthere is no login prompt. In addition, dmesg shows repeated exits of wpa_supplicant, this may or may not be related.
Host operating system
Output of
uname -a
.FreeBSD freebsdbook 14.0-RELEASE FreeBSD 14.0-RELEASE #0 releng/14.0-n265380-f9716eee8ab4: Fri Nov 10 05:57:23 UTC 2023 root@releng1.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64
Wireless NIC
Relevant output of
pciconf -lv
.Wifibox version
Output of
wifibox version
. Note that only the latest version is supported.Disk image type and version
The kind of VM image in use, e.g. Wifibox/Alpine, and its version. wifibox-alpine-20240506
Changes to the default configuration files
Include relevant sections from all the configuration files that contain local changes and may help to reproduce the problem.
wpa_supplicant.conf:
Logs
Please copy and paste any relevant log output from the following sources: