kaloz / mwlwifi

mac80211 driver for the Marvell 88W8864 802.11ac chip
393 stars 119 forks source link

Wireless clients stop receiving DHCP responses after some time #243

Closed Alexander-r closed 6 years ago

Alexander-r commented 6 years ago

Starting from commit af210f0 I had problems with broadcast traffic. After some time if a new wireless client connects it will not receive an IP address. In the logs on the router I see DHCPDISCOVER request and the attempt to send DHCPOFFER reply but it seems the client does not receive it. It just sends more requests. The clients already connected to the network continue to work fine. Wired clients continue to work fine too.

There are no errors in system and kernel log that might help identify why this starts happening. This problem occurs on both 2.4GHz and 5GHz networks. And it starts at random intervals after reboot. Router may work fine for a day or it may stop sending broadcast traffic after an hour of uptime. I can reproduce this problem on two android devices, notebook with Intel wireless 7265 and another router acting as client based on Atheros QCA9880. So it’s not just a single device affected.

I’ve tried the latest version of the driver after that but it also has this problem. I even tried compiling the latest lede snapshot with this driver but that did not help. The strangest this is that when I flashed clean lede 17.01.4 and tried it without updated driver I also had this problem. I also tried using odhcpd instead of dnsmasq. In all the tests I used the same configuration. And I have not changed any settings before noticing the problem.

My configuration files:

wireless

config wifi-device 'radio0'
    option type 'mac80211'
    option channel '36'
    option hwmode '11a'
    option path 'soc/soc:pcie-controller/pci0000:00/0000:00:01.0/0000:01:00.0'
    option htmode 'VHT80'
    option txpower '23'
    option country 'US'

config wifi-iface 'default_radio0'
    option device 'radio0'
    option network 'lan'
    option mode 'ap'
    option encryption 'psk2'
    option key '1234567890'
    option ssid 'WiFi_5'

config wifi-device 'radio1'
    option type 'mac80211'
    option channel '11'
    option hwmode '11g'
    option path 'soc/soc:pcie-controller/pci0000:00/0000:00:02.0/0000:02:00.0'
    option htmode 'HT40'
    option txpower '30'
    option country 'US'

config wifi-iface 'default_radio1'
    option device 'radio1'
    option network 'lan'
    option mode 'ap'
    option encryption 'psk2'
    option key '1234567890'
    option ssid 'WiFi_2'

dhcp

config dnsmasq
    option domainneeded '1'
    option boguspriv '1'
    option filterwin2k '0'
    option localise_queries '1'
    option rebind_protection '1'
    option rebind_localhost '1'
    option local '/lan/'
    option domain 'lan'
    option expandhosts '1'
    option nonegcache '0'
    option authoritative '1'
    option readethers '1'
    option leasefile '/tmp/dhcp.leases'
    option resolvfile '/tmp/resolv.conf.auto'
    option localservice '1'

config dhcp 'lan'
    option interface 'lan'
    option start '100'
    option limit '150'
    option leasetime '12h'
    option dhcpv6 'server'
    option ra 'server'
    option ra_management '1'

config dhcp 'wan'
    option interface 'wan'
    option ignore '1'

config odhcpd 'odhcpd'
    option maindhcp '0'
    option leasefile '/tmp/hosts/odhcpd'
    option leasetrigger '/usr/sbin/odhcpd-update'

network

config interface 'loopback'
    option ifname 'lo'
    option proto 'static'
    option ipaddr '127.0.0.1'
    option netmask '255.0.0.0'

config globals 'globals'
    option ula_prefix 'fd9c:6fdc:99cb::/48'

config interface 'lan'
    option type 'bridge'
    option ifname 'eth0'
    option proto 'static'
    option netmask '255.255.255.0'
    option ip6assign '60'
    option ipaddr '192.168.0.1'

config interface 'wan'
    option ifname 'eth1'
    option proto 'dhcp'

config interface 'wan6'
    option ifname 'eth1'
    option proto 'dhcpv6'

config switch
    option name 'switch0'
    option reset '1'
    option enable_vlan '1'

config switch_vlan
    option device 'switch0'
    option vlan '1'
    option ports '0 1 2 3 5'

config switch_vlan
    option device 'switch0'
    option vlan '2'
    option ports '4 6'
ratsputin commented 6 years ago

This is very similar to the issue I'm seeing as well. I mentioned this a few days ago and generally don't see it until 3-4 days after reboot.

Your point about broadcasts is a good one. In my case, I'm using a DHCP server on my LAN rather than the built-in DHCP server in LEDE and see the same behavior. Resetting the wifi network, or doing a Scan seems to fix this for 24-48 hours.

I'm currently testing my phone with a static address to see if it stays "up" when other devices that depend on DHCP disconnect. In my case, my 2.4GHz network seems to have more issues than the 5GHz network.

On Fri, Dec 15, 2017 at 9:31 AM Alexander-r notifications@github.com wrote:

Starting from commit af210f0 https://github.com/kaloz/mwlwifi/commit/af210f0ef79d5c416dd455747a16d76bec5912ba I had problems with broadcast traffic. After some time if a new wireless client connects it will not receive an IP address. In the logs on the router I see DHCPDISCOVER request and the attempt to send DHCPOFFER reply but it seems the client does not receive it. It just sends more requests. The clients already connected to the network continue to work fine. Wired clients continue to work fine too.

There are no errors in system and kernel log that might help identify why this starts happening. This problem occurs on both 2.4GHz and 5GHz networks. And it starts at random intervals after reboot. Router may work fine for a day or it may stop sending broadcast traffic after an hour of uptime. I can reproduce this problem on two android devices, notebook with Intel wireless 7265 and another router acting as client based on Atheros QCA9880. So it’s not just a single device affected.

I’ve tried the latest version of the driver after that but it also has this problem. I even tried compiling the latest lede snapshot with this driver but that did not help. The strangest this is that when I flashed clean lede 17.01.4 and tried it without updated driver I also had this problem. I also tried using odhcpd instead of dnsmasq. In all the tests I used the same configuration. And I have not changed any settings before noticing the problem.

My configuration files:

wireless

config wifi-device 'radio0' option type 'mac80211' option channel '36' option hwmode '11a' option path 'soc/soc:pcie-controller/pci0000:00/0000:00:01.0/0000:01:00.0' option htmode 'VHT80' option txpower '23' option country 'US'

config wifi-iface 'default_radio0' option device 'radio0' option network 'lan' option mode 'ap' option encryption 'psk2' option key '1234567890' option ssid 'WiFi_5'

config wifi-device 'radio1' option type 'mac80211' option channel '11' option hwmode '11g' option path 'soc/soc:pcie-controller/pci0000:00/0000:00:02.0/0000:02:00.0' option htmode 'HT40' option txpower '30' option country 'US'

config wifi-iface 'default_radio1' option device 'radio1' option network 'lan' option mode 'ap' option encryption 'psk2' option key '1234567890' option ssid 'WiFi_2'

dhcp

config dnsmasq option domainneeded '1' option boguspriv '1' option filterwin2k '0' option localise_queries '1' option rebind_protection '1' option rebind_localhost '1' option local '/lan/' option domain 'lan' option expandhosts '1' option nonegcache '0' option authoritative '1' option readethers '1' option leasefile '/tmp/dhcp.leases' option resolvfile '/tmp/resolv.conf.auto' option localservice '1'

config dhcp 'lan' option interface 'lan' option start '100' option limit '150' option leasetime '12h' option dhcpv6 'server' option ra 'server' option ra_management '1'

config dhcp 'wan' option interface 'wan' option ignore '1'

config odhcpd 'odhcpd' option maindhcp '0' option leasefile '/tmp/hosts/odhcpd' option leasetrigger '/usr/sbin/odhcpd-update'

network

config interface 'loopback' option ifname 'lo' option proto 'static' option ipaddr '127.0.0.1' option netmask '255.0.0.0'

config globals 'globals' option ula_prefix 'fd9c:6fdc:99cb::/48'

config interface 'lan' option type 'bridge' option ifname 'eth0' option proto 'static' option netmask '255.255.255.0' option ip6assign '60' option ipaddr '192.168.0.1'

config interface 'wan' option ifname 'eth1' option proto 'dhcp'

config interface 'wan6' option ifname 'eth1' option proto 'dhcpv6'

config switch option name 'switch0' option reset '1' option enable_vlan '1'

config switch_vlan option device 'switch0' option vlan '1' option ports '0 1 2 3 5'

config switch_vlan option device 'switch0' option vlan '2' option ports '4 6'

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kaloz/mwlwifi/issues/243, or mute the thread https://github.com/notifications/unsubscribe-auth/Ae5p4JxGXIl72jtm52JbGd-IMSBRODXLks5tApDkgaJpZM4RDoj5 .

ad019 commented 6 years ago

This could also be the reason that bonjour stops working on a mac network. I have had this problem every 10-12 hrs of uptime. If you reset the radios, it works again for a few hours.

yuhhaurlin commented 6 years ago

The strangest this is that when I flashed clean lede 17.01.4 and tried it without updated driver I also had this problem => What does this mean?

yuhhaurlin commented 6 years ago

BTW, when the problem happened, can you try:

  1. Assign static IP to see if client can ping to router.
  2. If ping is all right, please use arping to ping the client from router.
Alexander-r commented 6 years ago

Before updating drivers the router worked fine on stock lede 17.01.4 for about a week with exact same configuration. Now I can reproduce the problem on both the updated drivers and on the ones that are included in 17.01.4 by default. So older versions might be affected too.

Will try setting static IP and check ping next time this happens.

yuhhaurlin commented 6 years ago

What is the version for "old versions"?

Alexander-r commented 6 years ago

What is the version for "old versions"?

36bc32767ed89e07c5c83036861d2fa4eb1f8629

yuhhaurlin commented 6 years ago

This version is pretty old. No one mentioned this kind of issue. Whatever you can try what I suggested when the problem happened.

ratsputin commented 6 years ago

Okay, I was just able to reproduce this on 2.4GHz where I'm seeing this pretty consistently.

I previously configured and saved a 2.4GHz connection on my phone with a static IP address, successfully connected and stored the settings.

I have a Honeywell wifi thermostat about 15' from my router. I have hostapd in debug mode and I started seeing it streaming connection requests/disconnect from its MAC address. I checked the thermostat and it was in a "registering" state (attempting to connect to the wifi).

To see if the problem was the one we're discussing, I just attempted to switch from 5GHz on my phone to 2.4GHz and I received an "Incorrect Password" error.

Watching the hostapd debug messages, I did see my phone disconnect from wlan0; however, I did not see a connection attempt on wlan1.

Switching back to 5GHz (also static IP) initially said "no internet connection" (which is what it would normally say before I gave it a static IP), but then connected successfully.

I then performed a scan function from LuCI on radio 1, saw the wifi thermostat reconnect and checked it to verify it had connected successfully. It had.

I am still unable to join the 2.4GHz network with my phone, even with the correct password. It does, however, join the 5GHz network without issue. Interestingly, I never even see an authentication attempt come across the logs when I attempt to connect on the 2.4GHz network.

ratsputin commented 6 years ago

Apologies:

root@orion:/usr/var/log# cat /sys/kernel/debug/ieee80211/phy1/mwlwifi/info

driver name: mwlwifi
chip type: 88W8964
hw version: 7
driver version: 10.3.4.0-20171129
firmware version: 0x09030201
power table loaded from dts: no
firmware region code: 0x0
mac address: 60:38:e0:xx:xx:xx
2g: enable
5g: disable
antenna: 4 4
irq number: 46
ap macid support: 0000ffff
sta macid support: 00010000
macid used: 00000001
radio: enable
iobase0: e1300000
iobase1: e1580000
tx limit: 1024
rx limit: 16384

root@orion:/usr/var/log#
ratsputin commented 6 years ago

A follow-up. I just attempted to connect to the 2.4GHz network on my phone again and was successful. iPhone 8.

AjkayAlan commented 6 years ago

The strangest this is that when I flashed clean lede 17.01.4 and tried it without updated driver I also had this problem => What does this mean?

People who had a newer version of stock firmware (which had newer 88W8964 firmware) ran into issues where they would be connected to the router but have no internet connection. Even when they rolled back to the stock firmware from May, this issue persisted.

There is a thread on the Linksys forums about this at http://community.linksys.com/t5/Wireless-Routers/WRT32x-and-WRT3200ACM-WiFi-Issues/td-p/1246764.

I don't know that this DHCP issue is related to the one from the forums, but it seems possible that reverting to a prior version may persist issues that were introduced from newer versions based on what the community is seeing.

yuhhaurlin commented 6 years ago

Please focus on mwlwifi here. If you encounter problem, please create an issue to track it. Please specify following information:

  1. Device.
  2. Version of mwlwifi.
  3. Your setup (With what kind of clients).
  4. Steps to reproduce the problem.
  5. Useful log messages.
MrReSc commented 6 years ago

I don't know if my problem has something to do with it. I have two LibreElec Mediapalyers with static IP. The players can stream content from my NAS. When I try to access the service web list, connect to the remote app (Kore) or ping the IP, it doesn't work.

root@LEDE:~# cat /sys/kernel/debug/ieee80211/phy1/mwlwifi/info

driver name: mwlwifi
chip type: 88W8964
hw version: 7
driver version: 10.3.4.0-20171129
firmware version: 0x09030201
power table loaded from dts: no
firmware region code: 0x0
mac address: 60:38:e0:bd:8d:c9
2g: enable
5g: disable
antenna: 4 4
irq number: 106
ap macid support: 0000ffff
sta macid support: 00010000
macid used: 00000001
radio: enable
iobase0: e1200000
iobase1: e1480000
tx limit: 1024
rx limit: 16384
yuhhaurlin commented 6 years ago

If you can ping from client to router and arping from router to client, I think wifi is all right.

MrReSc commented 6 years ago

I was just trying to ping from client to client. That didn't work out. After a reboot it was ok again. I'll cover it up as soon as the problem recurs.

yuhhaurlin commented 6 years ago

If anyone wants to verify different version of firmware of 88W8964, you can replace the file 88W8964.bin under directory /lib/firmware/mwlwifi. Only version after 9.3.2.1 can support WDS client and version after 9.3.2.2 can support WDS AP.

yuhhaurlin commented 6 years ago

You should not set ap isolate.

ad019 commented 6 years ago

I am unable to ping my printer from my laptop. The printer shows as connected with 1 Mbps tx and Rx rate. Printer is not available to the computers. After reboot or reset of 2.4 GHz radio, everything is back to normal. The printer only connects on 2.4 GHz so I cannot comment on 5 GHz. Buy it does run bonjour.

yuhhaurlin commented 6 years ago

Don't set ap isolate.

yuhhaurlin commented 6 years ago

If client can work with router, but client can't communicate with each other, please make sure ap isolate (client isolation) is not enabled.

ad019 commented 6 years ago

It's not set to isolate. Also, if it were, a reboot would not set things right.

yuhhaurlin commented 6 years ago

Please give me configuration file of hostapd.

ad019 commented 6 years ago

root@router:/tmp# cat ath1_hostap.conf  driver=nl80211 ctrl_interface=/var/run/hostapd wmm_ac_bk_cwmin=4 wmm_ac_bk_cwmax=10 wmm_ac_bk_aifs=7 wmm_ac_bk_txop_limit=0 wmm_ac_bk_acm=0 wmm_ac_be_aifs=3 wmm_ac_be_cwmin=4 wmm_ac_be_cwmax=10 wmm_ac_be_acm=0 wmm_ac_vi_aifs=2 wmm_ac_vi_cwmin=3 wmm_ac_vi_cwmax=4 wmm_ac_vi_txop_limit=94 wmm_ac_vi_acm=0 wmm_ac_vo_aifs=2 wmm_ac_vo_cwmin=2 wmm_ac_vo_cwmax=3 wmm_ac_vo_txop_limit=47 wmm_ac_vo_acm=0 tx_queue_data3_aifs=7 tx_queue_data3_cwmin=15 tx_queue_data3_cwmax=1023 tx_queue_data3_burst=0 tx_queue_data2_aifs=3 tx_queue_data2_cwmin=15 tx_queue_data2_cwmax=63 tx_queue_data1_aifs=1 tx_queue_data1_cwmin=7 tx_queue_data1_cwmax=15 tx_queue_data1_burst=3.0 tx_queue_data0_aifs=1 tx_queue_data0_cwmin=3 tx_queue_data0_cwmax=7 tx_queue_data0_burst=1.5 country_code=US tx_queue_data2_burst=2.0 wmm_ac_be_txop_limit=64 ieee80211n=1 dynamic_ht40=1 ht_capab=[HT40-][LDPC][SHORT-GI-20][SHORT-GI-40][DSSS_CCK-40][MAX-AMSDU-7935] hw_mode=g channel=9 frequency=2452 beacon_int=100

interface=ath1 disassoc_low_ack=1 wmm_enabled=1 bssid=60:38:E0:xx:xx:xx ignore_broadcast_ssid=0 max_num_sta=256 dtim_period=2 ssid=WiFi24 bridge=br0 logger_syslog=-1 logger_stdout=-1 logger_stdout_level=2 eapol_version=1 eapol_key_index_workaround=0 wpa=2 wpa_passphrase=*** wpa_key_mgmt=WPA-PSK wpa_pairwise=CCMP wpa_group_rekey=3600

yuhhaurlin commented 6 years ago

Yes, AP isolate is not set.

yuhhaurlin commented 6 years ago

When problem happened, router can ping this printer?

ad019 commented 6 years ago

No it cannot.

ad019 commented 6 years ago
screen shot 2017-12-16
yuhhaurlin commented 6 years ago

How about your laptop? Router can ping it? BTW, what is the version of mwlwifi driver?

ad019 commented 6 years ago

The other strange thing is that the printer, which is normally connected at 72 Mbps when it's available to the network, drops to 1 Mbps Tx/ Rx.

The laptop can also not ping the printer.

root@router:~# cat /sys/kernel/debug/ieee80211/phy0/mwlwifi/info

driver name: mwlwifi chip type: 88W8964 hw version: 7 driver version: 10.3.4.0-20171129 firmware version: 0x09030202 power table loaded from dts: no firmware region code: 0x10 mac address: 60:38:e0:xx:xx:xx 2g: disable 5g: enable antenna: 4 4 irq number: 46 ap macid support: 0000ffff sta macid support: 00010000 macid used: 00000003 radio: enable iobase0: e0c00000 iobase1: e0e80000 tx limit: 1024 rx limit: 16384

yuhhaurlin commented 6 years ago

I mean if router can ping laptop or not?

yuhhaurlin commented 6 years ago

BTW, the wireless printer can work with stock firmware or other router without this problem?

ad019 commented 6 years ago

Yes router can ping the laptop.

screen shot 2017-12-16
yuhhaurlin commented 6 years ago

If rate dropping to the lowest one, it is possible a lot of errors for the data transmission.

ad019 commented 6 years ago

I am not sure if the stock firmware will work fine or not. I have not used it for almost a year now.

ad019 commented 6 years ago

Rate drops only for the printer. It is fine for other devices.

screen shot 2017-12-16
yuhhaurlin commented 6 years ago

It looks like this is the problem only happened with this specific wireless printer. What the wireless printer is?

ad019 commented 6 years ago

HP Deskjet 4675.

yuhhaurlin commented 6 years ago

I will check with our QA to see if we can find one to check it. Does anyone use this wireless printer and also encounter this problem?

yuhhaurlin commented 6 years ago

How long to make this problem happened? If possible, please help to check if it happened with stock firmware.

ad019 commented 6 years ago

Anywhere from a few hours to at most a day.

yuhhaurlin commented 6 years ago

Except for stock firmware, do you have another router to check if this problem also happened with it?

yuhhaurlin commented 6 years ago

BTW, the problem also happened with firmware 9.3.0.8?

ad019 commented 6 years ago

I have a old WRT54G I can test it with.

Another thing is that a printer restart does not help. Only a router restart or radio restart will set it right again.

ad019 commented 6 years ago

I don't think the problem happened with 9.3.0.8. I have been using it for almost a year without any problem.

yuhhaurlin commented 6 years ago
  1. Can you replace 9.3.2.2 with 9.3.0.8 and check if the problem is gone?
  2. Please help to test with updated stock firmware.
yuhhaurlin commented 6 years ago

Thanks for your help.

ad019 commented 6 years ago
  1. You mean use the same driver but older firmware? Don't think that's possible at my end for dd-wrt. But I can try with a older release which would also have the older driver.

  2. Linksys rolled back the latest firmware. But I will try with it.

yuhhaurlin commented 6 years ago

If the wireless printer can work with Firmware 9.3.0.8, there is no need for you to test other router.

yuhhaurlin commented 6 years ago

Just replace 88W8964.bin under directory /lib/firmware/mwlwifi.