kaloz / mwlwifi

mac80211 driver for the Marvell 88W8864 802.11ac chip
394 stars 119 forks source link

Wireless clients stop receiving DHCP responses after some time #243

Closed Alexander-r closed 6 years ago

Alexander-r commented 6 years ago

Starting from commit af210f0 I had problems with broadcast traffic. After some time if a new wireless client connects it will not receive an IP address. In the logs on the router I see DHCPDISCOVER request and the attempt to send DHCPOFFER reply but it seems the client does not receive it. It just sends more requests. The clients already connected to the network continue to work fine. Wired clients continue to work fine too.

There are no errors in system and kernel log that might help identify why this starts happening. This problem occurs on both 2.4GHz and 5GHz networks. And it starts at random intervals after reboot. Router may work fine for a day or it may stop sending broadcast traffic after an hour of uptime. I can reproduce this problem on two android devices, notebook with Intel wireless 7265 and another router acting as client based on Atheros QCA9880. So it’s not just a single device affected.

I’ve tried the latest version of the driver after that but it also has this problem. I even tried compiling the latest lede snapshot with this driver but that did not help. The strangest this is that when I flashed clean lede 17.01.4 and tried it without updated driver I also had this problem. I also tried using odhcpd instead of dnsmasq. In all the tests I used the same configuration. And I have not changed any settings before noticing the problem.

My configuration files:

wireless

config wifi-device 'radio0'
    option type 'mac80211'
    option channel '36'
    option hwmode '11a'
    option path 'soc/soc:pcie-controller/pci0000:00/0000:00:01.0/0000:01:00.0'
    option htmode 'VHT80'
    option txpower '23'
    option country 'US'

config wifi-iface 'default_radio0'
    option device 'radio0'
    option network 'lan'
    option mode 'ap'
    option encryption 'psk2'
    option key '1234567890'
    option ssid 'WiFi_5'

config wifi-device 'radio1'
    option type 'mac80211'
    option channel '11'
    option hwmode '11g'
    option path 'soc/soc:pcie-controller/pci0000:00/0000:00:02.0/0000:02:00.0'
    option htmode 'HT40'
    option txpower '30'
    option country 'US'

config wifi-iface 'default_radio1'
    option device 'radio1'
    option network 'lan'
    option mode 'ap'
    option encryption 'psk2'
    option key '1234567890'
    option ssid 'WiFi_2'

dhcp

config dnsmasq
    option domainneeded '1'
    option boguspriv '1'
    option filterwin2k '0'
    option localise_queries '1'
    option rebind_protection '1'
    option rebind_localhost '1'
    option local '/lan/'
    option domain 'lan'
    option expandhosts '1'
    option nonegcache '0'
    option authoritative '1'
    option readethers '1'
    option leasefile '/tmp/dhcp.leases'
    option resolvfile '/tmp/resolv.conf.auto'
    option localservice '1'

config dhcp 'lan'
    option interface 'lan'
    option start '100'
    option limit '150'
    option leasetime '12h'
    option dhcpv6 'server'
    option ra 'server'
    option ra_management '1'

config dhcp 'wan'
    option interface 'wan'
    option ignore '1'

config odhcpd 'odhcpd'
    option maindhcp '0'
    option leasefile '/tmp/hosts/odhcpd'
    option leasetrigger '/usr/sbin/odhcpd-update'

network

config interface 'loopback'
    option ifname 'lo'
    option proto 'static'
    option ipaddr '127.0.0.1'
    option netmask '255.0.0.0'

config globals 'globals'
    option ula_prefix 'fd9c:6fdc:99cb::/48'

config interface 'lan'
    option type 'bridge'
    option ifname 'eth0'
    option proto 'static'
    option netmask '255.255.255.0'
    option ip6assign '60'
    option ipaddr '192.168.0.1'

config interface 'wan'
    option ifname 'eth1'
    option proto 'dhcp'

config interface 'wan6'
    option ifname 'eth1'
    option proto 'dhcpv6'

config switch
    option name 'switch0'
    option reset '1'
    option enable_vlan '1'

config switch_vlan
    option device 'switch0'
    option vlan '1'
    option ports '0 1 2 3 5'

config switch_vlan
    option device 'switch0'
    option vlan '2'
    option ports '4 6'
daloki83 commented 6 years ago

9.3.0.8 with 10.3.4.0-20171214 seems to work fine for me. Lets see what the day brings EDIT Same problem as before

ad019 commented 6 years ago

Were you able to replace it on dd-wrt or was it on LEDE?

daloki83 commented 6 years ago

on LEDE Edit: now trying 9.3.2.1

MrReSc commented 6 years ago

As I mentioned earlier, similar problem with two LibreElec computers (Intel NUC). I don't think the problem is printer dependent.

Ping Client --> Router OK Ping Router --> Client OK Ping Client --> Client Not OK

Until recently I used davidc502 builds. Now I am on stable 17.01.4 with the newest mwlwifi driver. I didn't have the problem with davidc502 builds.

yuhhaurlin commented 6 years ago

Please test 9.3.0.8 and 9.3.2.2 with latest driver. If only 9.3.2.2 has this kind of problems, I will create an issue to collect them.

MrReSc commented 6 years ago

Where can i find this firmwares to download?

yuhhaurlin commented 6 years ago
  1. git clone this GitHub site.
  2. git checkout e119077b
  3. Get file 88W8964.bin under directory bin/firmware.
  4. Replace the one on device under directory /lib/firmware/mwlwifi.
MrReSc commented 6 years ago

9.3.0.8 with 10.3.4.0-20171214 seems to fix the issues on my side as well.

MrReSc commented 6 years ago

It wasn't the firmware change but the restart after the change that solved the problem. After about 15 minutes the problem was back.

yuhhaurlin commented 6 years ago

It looks like your problem is not the same as others. You mean router can ping both clients, but clients can't ping to each other? It looks like not WiFi problem.

MrReSc commented 6 years ago

Yes, thats correct. Do you have any idea where the problem is located?

yuhhaurlin commented 6 years ago

I only think about ap_isolate. Can you check if ap_isolate is set to 1 in your hostapd's configuration file?

MrReSc commented 6 years ago

I already checked it. It's disabled.

ratsputin commented 6 years ago

So, I did a fresh build from trunk last night and rebuilt. Uptime is 7h 50m.

OpenWrt SNAPSHOT r5521-9f8d28285d / LuCI Master (git-17.342.53118-6d086bf)

I'm seeing the same 2.4GHz behavior already. What's interesting is I do still have devices connected and (apparently) working.

root@orion:/usr/var/log# cat /sys/kernel/debug/ieee80211/phy1/mwlwifi/info

driver name: mwlwifi
chip type: 88W8964
hw version: 7
driver version: 10.3.4.0-20171214
firmware version: 0x09030202
power table loaded from dts: no
firmware region code: 0x10
mac address: 60:38:e0:xx:xx:xx
2g: enable
5g: disable
antenna: 4 4
irq number: 46
ap macid support: 0000ffff
sta macid support: 00010000
macid used: 00000001
radio: enable
iobase0: e1300000
iobase1: e1580000
tx limit: 1024
rx limit: 16384

root@orion:/usr/var/log#
yuhhaurlin commented 6 years ago

@MrReSc If ap_isolate is not set, I have no idea about that. However it looks like not a problem of WiFi.

MrReSc commented 6 years ago

@yuhhaurlin it just happend on wifi connected devices.

ratsputin commented 6 years ago

I did a bit more testing with the wifi network in this situation.

Here's a rundown of the currently-connected devices on wlan1:

root@orion:/usr/var/log# iw dev wlan1 station dump
Station 80:a5:89:xx:xx:xx (on wlan1)
        inactive time:  8150 ms
        rx bytes:       316350
        rx packets:     3550
        tx bytes:       257146
        tx packets:     3513
        tx retries:     0
        tx failed:      0
        rx drop misc:   1
        signal:         -71 dBm
        signal avg:     -70 dBm
        tx bitrate:     39.0 MBit/s MCS 4
        rx bitrate:     19.5 MBit/s MCS 2
        authorized:     yes
        authenticated:  yes
        associated:     yes
        preamble:       short
        WMM/WME:        yes
        MFP:            no
        TDLS peer:      no
        DTIM period:    2
        beacon interval:100
        short preamble: yes
        short slot time:yes
        connected time: 28757 seconds
Station 00:04:20:xx:xx:xx (on wlan1)
        inactive time:  20 ms
        rx bytes:       2057855
        rx packets:     8761
        tx bytes:       576122
        tx packets:     4803
        tx retries:     0
        tx failed:      0
        rx drop misc:   0
        signal:         -64 dBm
        signal avg:     -64 dBm
        tx bitrate:     72.2 MBit/s MCS 7 short GI
        rx bitrate:     72.2 MBit/s MCS 7 short GI
        authorized:     yes
        authenticated:  yes
        associated:     yes
        preamble:       short
        WMM/WME:        yes
        MFP:            no
        TDLS peer:      no
        DTIM period:    2
        beacon interval:100
        short preamble: yes
        short slot time:yes
        connected time: 28756 seconds
Station cc:44:63:xx:xx:xx (on wlan1)
        inactive time:  88930 ms
        rx bytes:       346252
        rx packets:     1689
        tx bytes:       449004
        tx packets:     1265
        tx retries:     0
        tx failed:      0
        rx drop misc:   0
        signal:         -73 dBm
        signal avg:     -73 dBm
        tx bitrate:     2.0 MBit/s
        rx bitrate:     54.0 MBit/s
        authorized:     yes
        authenticated:  yes
        associated:     yes
        preamble:       short
        WMM/WME:        yes
        MFP:            no
        TDLS peer:      no
        DTIM period:    2
        beacon interval:100
        short preamble: yes
        short slot time:yes
        connected time: 28691 seconds
Station 0c:2a:69:xx:xx:xx (on wlan1)
        inactive time:  20960 ms
        rx bytes:       224986
        rx packets:     1553
        tx bytes:       119574
        tx packets:     1605
        tx retries:     0
        tx failed:      0
        rx drop misc:   1
        signal:         -78 dBm
        signal avg:     -78 dBm
        tx bitrate:     19.5 MBit/s MCS 2
        rx bitrate:     13.0 MBit/s MCS 1
        authorized:     yes
        authenticated:  yes
        associated:     yes
        preamble:       short
        WMM/WME:        yes
        MFP:            no
        TDLS peer:      no
        DTIM period:    2
        beacon interval:100
        short preamble: yes
        short slot time:yes
        connected time: 28674 seconds
Station 00:d0:2d:xx:xx:xx (on wlan1)
        inactive time:  8200 ms
        rx bytes:       17543
        rx packets:     195
        tx bytes:       17111
        tx packets:     157
        tx retries:     0
        tx failed:      0
        rx drop misc:   0
        signal:         -55 dBm
        signal avg:     -54 dBm
        tx bitrate:     65.0 MBit/s MCS 7
        rx bitrate:     65.0 MBit/s MCS 7
        authorized:     yes
        authenticated:  yes
        associated:     yes
        preamble:       short
        WMM/WME:        yes
        MFP:            no
        TDLS peer:      no
        DTIM period:    2
        beacon interval:100
        short preamble: yes
        short slot time:yes
        connected time: 524 seconds
root@orion:/usr/var/log#

Of the devices connected: 00:04:20 is a Logitech Harmony Hub - Pingable 0c:2a:69 is a Rachio sprinkler controller - Pingable 80:a5:89 is an iRobot Roomba - Pingable

What's interesting is that prior to doing the above command, other devices were connected and not pingable, including an Apple iPad Pro. Its MAC address starts with cc:44:63.

yuhhaurlin commented 6 years ago

@MrReSc If router can work with both of them, WiFi has no problem. I wonder if you client supports TDLS? No matter which case, it is not a problem of WiFi of router.

ratsputin commented 6 years ago

Okay, I did a scan and the iPad showed up. It is not currently pingable; however, I cannot yet connect to the 2.4GHz network with my iPhone.

Station cc:44:63:xx:xx:xx (on wlan1)
        inactive time:  45740 ms
        rx bytes:       346562
        rx packets:     1694
        tx bytes:       449314
        tx packets:     1270
        tx retries:     0
        tx failed:      0
        rx drop misc:   0
        signal:         -74 dBm
        signal avg:     -73 dBm
        tx bitrate:     6.5 MBit/s MCS 0
        rx bitrate:     19.5 MBit/s MCS 2
        authorized:     yes
        authenticated:  yes
        associated:     yes
        preamble:       short
        WMM/WME:        yes
        MFP:            no
        TDLS peer:      no
        DTIM period:    2
        beacon interval:100
        short preamble: yes
        short slot time:yes
        connected time: 29098 seconds
ratsputin commented 6 years ago

Well now, this is interesting. I was finally able to get my iPhone 8 to connect to the 2.4GHz network a couple of minutes after a scan. Here's the info:

Station 40:9c:28:xx:xx:xx (on wlan1)
        inactive time:  0 ms
        rx bytes:       28131
        rx packets:     156
        tx bytes:       75928
        tx packets:     124
        tx retries:     0
        tx failed:      0
        rx drop misc:   0
        signal:         -65 dBm
        signal avg:     -59 dBm
        tx bitrate:     117.0 MBit/s MCS 14
        rx bitrate:     130.0 MBit/s MCS 15
        authorized:     yes
        authenticated:  yes
        associated:     yes
        preamble:       short
        WMM/WME:        yes
        MFP:            no
        TDLS peer:      no
        DTIM period:    2
        beacon interval:100
        short preamble: yes
        short slot time:yes
        connected time: 29 seconds

Pinging it from a machine on the LAN (wired), I'm seeing the following:

Pinging 192.168.1.69 with 32 bytes of data:
Reply from 192.168.1.69: bytes=32 time=128ms TTL=64
Reply from 192.168.1.69: bytes=32 time=149ms TTL=64
Request timed out.
Reply from 192.168.1.69: bytes=32 time=142ms TTL=64
Request timed out.
Reply from 192.168.1.69: bytes=32 time=166ms TTL=64
Reply from 192.168.1.69: bytes=32 time=189ms TTL=64
Reply from 192.168.1.69: bytes=32 time=212ms TTL=64
Reply from 192.168.1.69: bytes=32 time=231ms TTL=64
Reply from 192.168.1.69: bytes=32 time=151ms TTL=64
Reply from 192.168.1.69: bytes=32 time=176ms TTL=64
Reply from 192.168.1.69: bytes=32 time=297ms TTL=64
Reply from 192.168.1.69: bytes=32 time=115ms TTL=64
Reply from 192.168.1.69: bytes=32 time=137ms TTL=64
Reply from 192.168.1.69: bytes=32 time=262ms TTL=64
Reply from 192.168.1.69: bytes=32 time=181ms TTL=64
Reply from 192.168.1.69: bytes=32 time=200ms TTL=64
Reply from 192.168.1.69: bytes=32 time=131ms TTL=64
Reply from 192.168.1.69: bytes=32 time=146ms TTL=64
Reply from 192.168.1.69: bytes=32 time=170ms TTL=64
Reply from 192.168.1.69: bytes=32 time=191ms TTL=64
Reply from 192.168.1.69: bytes=32 time=110ms TTL=64
Reply from 192.168.1.69: bytes=32 time=133ms TTL=64
Reply from 192.168.1.69: bytes=32 time=153ms TTL=64
Reply from 192.168.1.69: bytes=32 time=176ms TTL=64

Ping statistics for 192.168.1.69:
    Packets: Sent = 25, Received = 23, Lost = 2 (8% loss),
Approximate round trip times in milli-seconds:
    Minimum = 110ms, Maximum = 297ms, Average = 171ms

I went and woke the iPad up and it is now pingable with drops. Next time I have this issue, I'll wake the iPad before doing the scan.

yuhhaurlin commented 6 years ago

@ratsputin I can't find problem about what you described. 2.4g band could be noisy.

ratsputin commented 6 years ago

@yuhhaurlin Help me understand here, as I'm not a Wifi expert.

If this were noise-related, I would expect the issue to continue even after a scan. I would also expect all devices to drop, not just some devices.

Thoughts?

yuhhaurlin commented 6 years ago

Can you just test the client you think it has problem? From your description, I can't know what the problem is?

ratsputin commented 6 years ago

Apologies. I was trying to provide a lot of related information based on previous issues I've described. I'll summarize here.

Next time this happens, I plan to test an iPad as well as an older Macbook Pro. The Mac is currently on the wired network, but I can easily test wifi.

ratsputin commented 6 years ago

I should also add that the wifi thermostat shows messages hostapd in logread where it's repeatedly authenticating then disconnecting when the issue occurs. When I attempt to connect from my iPhone, I don't see a single connection attempt message.

yuhhaurlin commented 6 years ago

Why you use scan when you set your device as AP?

ad019 commented 6 years ago

@yuhhaurlin Currently I have setup a test with following fw and driver:

root@router:~# cat /sys/kernel/debug/ieee80211/phy0/mwlwifi/info

driver name: mwlwifi chip type: 88W8964 hw version: 7 driver version: 10.3.4.0-20170810 firmware version: 0x09030008 power table loaded from dts: no firmware region code: 0x10 mac address: 60:38:e0:xx:xx:xx 2g: enable 5g: enable antenna: 4 4 irq number: 46 ap macid support: 0000ffff sta macid support: 00010000 macid used: 00000003 radio: enable iobase0: e0c00000 iobase1: e0e80000 tx limit: 1024 rx limit: 16384

And the printer is responding so far. But it's been only a few minutes. I will let it run for a few hours and see how it goes.

MacBook-Pro:~$ dns-sd -B Browsing for _http._tcp DATE: ---Sat 16 Dec 2017--- ...STARTING... Timestamp A/R Flags if Domain Service Type Instance Name 19:09:07.966 Add 2 5 local. _http._tcp. HP DeskJet 4670

ratsputin commented 6 years ago

I use it as a tool to find out if there are other wifi networks nearby on adjacent channels. I have neighbors who are not particularly technical that will use channels other than 1, 6 or 11. I've also found that just doing a "scan" will reset the state of the wifi network, so I mostly use it as a convenient way to "fix" the issue I'm seeing.

yuhhaurlin commented 6 years ago

So your problem is that all devices on 2.4g can't connect to router after a period of time? When this problem happened, you can try set other channel to AP to see if it can be recovered.

yuhhaurlin commented 6 years ago

Noise of 2.4g band does not only come from WiFi devices. Can you confirm when this problem happened, all devices can't connect to AP? And setting another channel to AP can resolve this problem.

ratsputin commented 6 years ago

Only two active devices are unable to connect. All other devices maintain their connections. I've seen this behavior on channels 1, 6 and 11.

kubrickfr commented 6 years ago

I've had the issue too after updating to commit 1522af59a501f74c8b6b02ae763829a79114a325, but it would only happen with with my Linux machine. I switch the dhcp client of Network Manager to dhclient rather than the built-in one and it fixed my problem. What made me switch was that I noticed that despite NerworkManager not getting an address, running dhclient would get me an IP reliably.

ad019 commented 6 years ago

@yuhhaurlin after working for about 30 mins, the printer has stopped being available to the network. Browsing for _http._tcp DATE: ---Sat 16 Dec 2017--- 19:54:43.540 ...STARTING... Timestamp A/R Flags if Domain Service Type Instance Name 19:54:43.541 Add 2 5 local. _http._tcp. HP DeskJet 4670 19:55:03.963 Rmv 0 5 local. _http._tcp. HP DeskJet 4670

ad019 commented 6 years ago

And I am unable to ping the printer from the router.

yuhhaurlin commented 6 years ago

@ad019 So no matter which version of firmware, you will encounter this problem?

yuhhaurlin commented 6 years ago

@ratsputin Can you just use these two devices to do test?

ad019 commented 6 years ago

So matter which version of firmware, you will encounter this problem?

Looks like it.

yuhhaurlin commented 6 years ago

@ad019 Can you test this wireless printer with your WRT54g?

ad019 commented 6 years ago

@yuhhaurlin yes I will do that over Sunday and let you know the result.

ratsputin commented 6 years ago

@yuhhaurlin By "these two devices", I assume you mean my iPhone 8 and the wifi thermostat? What exactly do you want me to test? Only having those two devices on the wifi network until it fails?

yuhhaurlin commented 6 years ago

@ratsputin Yes. And it would be easy to get log Or do something to check it later.

ad019 commented 6 years ago

@yuhhaurlin test setup with the printer and macbook has been up for more than an hour now and printer has not dropped from network. I will keep it going for a few more hours.

Alexander-r commented 6 years ago

I was able to test the behavior with static address. For me if I start arping on the router to the client with the static IP first then it does not work. Ping from client with static IP to the router works. And as soon as I start pinging the router from this client arping on the router starts receiving replies too. If I manually set the gateway and default route on this client it is able to access the internet. And I have not notices any problems with transmission speed.

For anyone who wants to test using arping don't forget to specify which interface to use like this: arping -I br-lan 192.168.1.10 Because by default for me it tries to use wan interface.

This test was performed on version 843d00cd9c134629b9dad7162831ec5f136399b3. The client was my notebook with Intel Wireless 7265 while connecting to 5GHz with settings mentioned in the first post. ap_isolation was not set. When the problem starts any other device which was not connected to wireless network is unable to get IP from DHCP.

I don't see anything abnormal in the logs. With odhcpd when a client tries to obtain an address after the problems start I only see:

daemon.warn odhcpd[1383]: received DHCPV4_MSG_DISCOVER from xx:xx:xx:xx:xx:xx
daemon.warn odhcpd[1383]: sending DHCPV4_MSG_OFFER to ff:ff:ff:ff:ff:ff - 255.255.255.255

xx:xx:xx:xx:xx:xx would be the clients MAC address. This two lines are repeated for as long as the client tries to connect for every attempt. With dnsmasq I get the messages with the same meaning but with a bit less information (they only mention the client mac and that the DHCPOFFER was sent).

Should I also test with any previous firmwares? Are any additional logs or configuration files required?

yuhhaurlin commented 6 years ago

@Alexander-r When the problem happened, if you set static IP, what the result you get?

Alexander-r commented 6 years ago

@yuhhaurlin After I set static address the client is able to connect. If right after the connection client does nothing then the router is unable to ping the client. After connection the client is able to ping the router and access the internet. If the client pings the router then the router is able to ping the client.

yuhhaurlin commented 6 years ago

@Alexander-r

  1. If you can use static IP to ping router, it means WiFi is all right.
  2. If you can arping from router to client, it means WiFi does not block broadcast data.
  3. If above two items are all right after the problem happened, it looks like WiFi is still all right.
yuhhaurlin commented 6 years ago

@Alexander-r If you set all of your clients to use static IP, will you encounter any problems?

yuhhaurlin commented 6 years ago

@Alexander-r BTW, where do you install your DHCP server?

Alexander-r commented 6 years ago

@yuhhaurlin arping from router does not work if after connection the client does not send any data first.

So it seems only the client is able to arp discovery. And the router only sees the client when it sends some data so that it can get it's mac from cache.

I think if I set static address on each client they might work after they send something to router first.

The DHCP server is on the router.

yuhhaurlin commented 6 years ago

@Alexander-r This problem only happened on new driver? How long to make it happened. According to your description, it is generic problem. In this case, everyone should encounter this problem. Can you let me know how to reproduce it.