Open porentak opened 3 years ago
have added debugs to
Might help to post patch of the debugs or link to your patch.
You're right, it does sound like an interface bring-up order issue. From reading your comment, booting the system can cause one of two states -- omac_idx: 0x00
for band_idx: 0x1
(good), and omac_idx: 0x11
for band_idx: 0x1
(bad). When WiFi is reset later, this causes the malfunction.
Commit cd795267 suggests the mt7615 chip shares a global pool of 32 omac
s between the two wireless PHYs. This commit increased the total number of VIFs but may have messed with interface bringup order, not sure precisely.
Try building OpenWrt with cd79526 reverted -- I think it should be
git format-patch ...
target/linux/ramips/patches-5.4/
and then rebuild OpenWrt for your board with kmod-mt76 installed directly in the rootfs or initramfs (assuming you have U-Boot serial access).I don't see an obvious way to git bisect
one package from the main OpenWrt repo, since commits are squashed under package/kernel/mt76
. I would still try git bisect
ing the whole OpenWrt tree. You should end up at one squash of commits from under git log -p package/kernel/mt76
, at which point you might be able to quickly script a replay of those commits (revert the commit, then format-patch
each listed squashed mt76
commit and drop it into patches-5.4/
)
Then you could bisect that.
@Hurricos
I reverted commit cd79526 and it does not help. Results are the same.
To get all the bug and bug fixes I have migrate to latest OpenWrt commit (20d847d1338f716fc9f143f633b6f79ba6017b5c), where mt76 driver (4a90fdf61) is used.
As @Hurricos suggested, I'm adding my debug patch:
--- a/mt7615/mcu.c
+++ b/mt7615/mcu.c
@@ -725,6 +725,9 @@ mt7615_mcu_add_beacon_offload(struct mt7615_dev *dev,
info->hw_queue |= MT_TX_HW_QUEUE_EXT_PHY;
}
`
+ printk("req: omac_idx: 0x%02x, enable: 0x%02x, wlan_idx: 0x%02x, band_idx: 0x%02x\n\tvif->addr: %pM\n",
+ req.omac_idx, req.enable, req.wlan_idx, req.band_idx, vif->addr);
+
mt7615_mac_write_txwi(dev, (__le32 *)(req.pkt), skb, wcid, NULL,
0, NULL, true);
memcpy(req.pkt + MT_TXD_SIZE, skb->data, skb->len);`
To reproduce this issue, I found even easier procedure. At boot time, both APs are enabled. Here, we have two scenarios depending which interface is enabled first.
hostapd: Configuration file: /var/run/hostapd-phy0.conf (phy wlan0) --> new PHY
[ 60.095868] req: omac_idx: 0x00, enable: 0x01, wlan_idx: 0x00, band_idx: 0x00
[ 60.095868] vif->addr: 00:11:22:01:05:bc
hostapd: Configuration file: /var/run/hostapd-phy1.conf (phy wlan1) --> new PHY
[ 62.146363] req: omac_idx: 0x11, enable: 0x01, wlan_idx: 0x00, band_idx: 0x01
[ 62.146363] vif->addr: 00:11:22:01:05:bd
Execute: wifi down
At this point, if I disable radio1 in uci (option disabled '1'
) and restart wifi (wifi
) I get:
req: omac_idx: 0x00, enable: 0x01, wlan_idx: 0x00, band_idx: 0x00
vif->addr: 00:11:22:01:05:bc
wifi on radio0 is working.
If instead of disabling radio1, I disable radio0 and restart wifi I get:
req: omac_idx: 0x00, enable: 0x01, wlan_idx: 0x00, band_idx: 0x01
vif->addr: 00:11:22:01:05:bd
and wifi on radio1 is not working.
Configuration file: /var/run/hostapd-phy1.conf (phy wlan1) --> new PHY
[ 23.500227] req: omac_idx: 0x00, enable: 0x01, wlan_idx: 0x00, band_idx: 0x01
[ 23.500227] vif->addr: 00:11:22:01:05:bd
Configuration file: /var/run/hostapd-phy0.conf (phy wlan0) --> new PHY
[ 24.769704] req: omac_idx: 0x11, enable: 0x01, wlan_idx: 0x00, band_idx: 0x00
[ 24.769704] vif->addr: 00:11:22:01:05:bc
Disabling radio1
req: omac_idx: 0x00, enable: 0x01, wlan_idx: 0x00, band_idx: 0x00
vif->addr: 00:11:22:01:05:bc
wifi on radio0 is not working.
Disabling radio0
req: omac_idx: 0x00, enable: 0x01, wlan_idx: 0x00, band_idx: 0x01
vif->addr: 00:11:22:01:05:bd
wifi on radio1 is working.
From this debugs it looks like issue ocures if omac_idx is changed since startup/first init. Is this root cause or just consequence, I don't know.
Thanks for additional tips/directions.
My first guess is we have to keep a main radio alive (the first wlan in your case). I can't remember the design details. (so not sure yet)
@ryderlee1110 If it helps...
updated to latest mt76 driver (abdd471e9f2d5c2287c095df58f32432dc0ceb00, Jan 5, 2021)
I've boot up router with radio0 enabled and one wifi-iface on it, while radio1 and wifi-iface on it are disabled. radio0 and AP on it is working fine.
wifi down
Update configuration to disabled radio0 and enable radio1.
wifi
AP on radio1 is not working.
Trying to be helpful, but probably going into wrong direction, I tried to fix omac_idx per band:
I meant firmware seems to have a strict order. Or, you can try in-house driver (if you can) to double confirm it.
I don't think this is true.
If I enable only radio1 and reboot the router, it works just fine.
What I'm saying is the first interface you enable regardless of radios, so radio1 should be first interface after reboot, right? Just suspect firmware using the first wlan driver set as main radio.
I tried with in-house driver. And it works just fine. I tried multiple orders of enabling/disabling interfaces/radios.
While doing that, I think I found the difference between both drivers. In in-house driver after both interfaces are disabled and new interface is enabled MT7615 chip is reinitialized (firmware reload, ...).
@ryderlee1110 did you manage to find some time to dig into this issue?
There's a strict interface order for mt7615. I don't think this is an issue.
hi, I recently tested dbdc on mt7915d. Can you check if mt7615d work with "wifi reload" and check if ieee80211_start_ap() -> mt7915_bss_info_changed() are called?
hi, I recently tested dbdc on mt7915d. Can you check if mt7615d work with "wifi reload" and check if ieee80211_start_ap() -> mt7915_bss_info_changed() are called?
@ryderlee1110 If this is general question for mt7615d, then yes. It works. mt7615_bss_info_changed is called after "wifi reload" if parameter(s) in /etc/config/wireless is changed.
If question targets this issue, then no, result is the same as with commands: "wifi down; wifi".
Tested based on commits: mt76: 8696919d9aae9b673f916bca41c5e1671eec5b0e (2021-01-27) openwrt: 740af59b9c7ee879b6936dd03bf37d37a54dda47 (2021-02-02)
Step I have used to test "wifi reload" reproducing this issue:
config wifi-iface 'ap_radio1' ... option disabled '1'
I found one way to overcome this limitation. In wireless configuration, under both wifi-device(s) add: option serialize '1'
This will instruct netifd to configure wireless device interfaces one-by-one. By doing that, interfaces are set-up in same order.
With this trick, wireless configuration (e.g.: SSID, keys, ...) can be changed reliably in runtime.
great. do you think we can close this ticket.
wireless scan is broken.The router could only found 5g signal in 2.4g scan interface and there is no result in 5g scan interface.Did u meet this problem?
wireless scan is broken.The router could only found 5g signal in 2.4g scan interface and there is no result in 5g scan interface.Did u meet this problem?
I faced the same problem.
After some debugging, I found that the reason is that iwinfo returns phy1 for radio0, but does not find radio1 at all.
Having studied the iwconfig code a little, I realized that if you specify phy
instead of path
, it should work.
For example:
config wifi-device 'radio0'
option type 'mac80211'
option phy 'phy0'
option htmode 'HT40'
option serialize '1'
option country 'US'
option cell_density '0'
option hwmode '11g'
option channel '1'
config wifi-device 'radio1'
option type 'mac80211'
option phy 'phy1'
option serialize '1'
option country 'US'
option cell_density '0'
option hwmode '11a'
option htmode 'VHT80'
option channel '36'
option txpower '20'
Scanning and editing all settings in LuCi works fine.
// I using Openwrt snapshot from master
Here to confirm that @Azq2 solution works for me (DIR-853-A3/MT7615DN with DBDC). Having the 3 options (phy, serialize and the country code) fixes the issue of freezing after reboot with both cards enabled https://github.com/openwrt/mt76/issues/448
wireless scan is broken.The router could only found 5g signal in 2.4g scan interface and there is no result in 5g scan interface.Did u meet this problem?
I faced the same problem.
After some debugging, I found that the reason is that iwinfo returns phy1 for radio0, but does not find radio1 at all.
Having studied the iwconfig code a little, I realized that if you specify
phy
instead ofpath
, it should work.For example:
config wifi-device 'radio0' option type 'mac80211' option phy 'phy0' option htmode 'HT40' option serialize '1' option country 'US' option cell_density '0' option hwmode '11g' option channel '1' config wifi-device 'radio1' option type 'mac80211' option phy 'phy1' option serialize '1' option country 'US' option cell_density '0' option hwmode '11a' option htmode 'VHT80' option channel '36' option txpower '20'
Scanning and editing all settings in LuCi works fine.
// I using Openwrt snapshot from
master
https://github.com/MeIsReallyBa/openwrt/commit/a0ffe441f69b056ecf304d7b3b9c0b5311c03ba1
I edited mac80211.sh and it seems also works correctly.
EDIT: OK never mind it's working for me now
Hello,
I'm investigating an issue after wifi interfaces restart, where clients can not connect to APs.
CPU board: UniElec 7621-06 WiFi: BPI-7615 (running in DBDC mode) OpenWRT: cfbda6627956af0cab380d03fd9275574e67921e (1.12.2020) MT76 driver: 066cc441eb8fcec7a3aeb6a320f5f9e6c21790f1 (21.11.2020)
Configuration is very simple: 1 AP on each radio, unique BSSID, unique SSID, without encryption, fixed channels (if any help I can attach configuration) After reboot both APs are working (2.4GHz and 5GHz) just fine: beacons are visible in the air (using external sniffer), clients can connect to both APs, ....
Now, the fun begins. After running script
/sbin/wifi
strange things starts.In some rare cases (1 of 5) everything is working just fine. But in other cases WiFi clients can not connect to any AP. In logs I can see this:
Wed Dec 2 13:07:08 2020 daemon.info hostapd: wlan1: STA 60:8b:0e:08:a9:99 IEEE 802.11: authenticated
Wed Dec 2 13:07:08 2020 daemon.info hostapd: wlan1: STA 60:8b:0e:08:a9:99 IEEE 802.11: associated (aid 1)
Wed Dec 2 13:07:08 2020 daemon.notice hostapd: wlan1: AP-STA-CONNECTED 60:8b:0e:08:a9:99
Wed Dec 2 13:07:08 2020 daemon.info hostapd: wlan1: STA 60:8b:0e:08:a9:99 RADIUS: starting accounting session 7BB47469AFF75D02
Wed Dec 2 13:07:10 2020 daemon.notice hostapd: wlan1: AP-STA-DISCONNECTED 60:8b:0e:08:a9:99
Wed Dec 2 13:07:10 2020 daemon.info hostapd: wlan1: STA 60:8b:0e:08:a9:99 IEEE 802.11: disassociated
Wed Dec 2 13:07:11 2020 daemon.info hostapd: wlan1: STA 60:8b:0e:08:a9:99 IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
Enabling hostapd debug logs (-d):
Wed Dec 2 13:14:39 2020 daemon.debug hostapd: wlan1: Event RX_MGMT (18) received
Wed Dec 2 13:14:39 2020 daemon.debug hostapd: mgmt::disassoc
Wed Dec 2 13:14:39 2020 daemon.debug hostapd: disassocation: STA=60:8b:0e:08:a9:99 reason_code=8
Wed Dec 2 13:14:39 2020 daemon.notice hostapd: wlan1: AP-STA-DISCONNECTED 60:8b:0e:08:a9:99
It is the client who is sending disconnect frame to APs. But why?
Investigating it further, it turned out, that there are no beacon frames in the air from my APs (in 100ms interval) when issue arise. But both APs do respond to Probe Request frames.
By my understanding MT7615 sends out beacon frames without CPU intervention. CPU has to prepare beacon frame content and give it to MCU for periodic transmit. If this is correct, my additional debugs could help to solve this. I have added debugs to
mt7615_mcu_add_beacon_offload
function. After reboot I see this values (every 6 seconds):req: omac_idx: 0x00, enable: 0x1, wlan_idx: 0x0, band_idx: 0x1
req: omac_idx: 0x11, enable: 0x1, wlan_idx: 0x0, band_idx: 0x0
If system is working fine, I get same output. If not, I get this:req: omac_idx: 0x00, enable: 0x1, wlan_idx: 0x0, band_idx: 0x0
req: omac_idx: 0x11, enable: 0x1, wlan_idx: 0x0, band_idx: 0x1
Can someone help me understand what happens in non working case. Is there interface bring up order issue?Other things I have tried already:
I'm not even sure it is mt76 issue.
Any suggestions, ideas, directions, debugging ideas are welcome.