o11s / open80211s

open80211s
Other
242 stars 55 forks source link

Reboot under heavy traffic over low-quality link #2

Open jcard0na opened 12 years ago

jcard0na commented 12 years ago

Submitted by David Fulgham via the mailing list:

I've been experiencing an issue with our mesh network and mesh nodes repeating constantly. After some troubleshooting it looks like I may have come across a transmit buffer overrun issue or the like.

I've been able to reproduce it quite readily by simply pushing as many packets through the mesh as possible between two nodes that have a low signal level connection (i.e. < 75dbm) and thus have a situation where there is a need to re-transmit a larger number of packets.

PC -> MeshNode1 ~~ <75dbm link ~~ Meshnode2

I then ping mesh node 2 from the PC with a command like

ping -i .01 <ip of mesh node 2>

and after a few minutes (and I assume the transmit buffer fills up and overflows) one or both of the radios will reboot. I can make the reboot happen pretty much anytime I want by moving the radios to almost out of range of each other. I'm using the openwrt trunk snapshot from Feb26th on UBNT Rocket 5M devices, and thus don't have any debugging turned on.

jcard0na commented 12 years ago

The above commit has nothing to do with this issue. It was just picked up by github because #1 appeared in the commit log.

jcard0na commented 12 years ago
<4>[  106.190000] Call Trace:
<4>[  106.190000] [<83b29074>] ath_txchainmask_reduction+0x154/0x1170
[ath9k]
<4>[  106.190000] [<83b28fc8>] ath_txchainmask_reduction+0xa8/0x1170 [ath9k]
<4>[  106.190000]
<4>[  106.190000]
<4>[  106.190000] Code: 00831821  70522002  8c630000 <8c630004> 00839021
8e2308ec  92420007  30630020  10600004
<4>[  106.400000] ---[ end trace 574d10138db4623f ]---

Looks ath9k related.

bcopeland commented 11 years ago

This may have fixed it... ath9k would have crashed with no rates set up while ath5k warned.

commit 9cbbffe2ded494429b0d005a51a88242bd9b3095 Author: Bob Copeland me@bobcopeland.com Date: Wed Jan 9 12:34:55 2013 -0500

mac80211: set NEED_TXPROCESSING for PERR frames
ansanto commented 10 years ago

I do not know if that this issue has already been fixed, but what is meant by "the radios will reboot"? It's just an AR93XX reset or a real AirOS/Openwrt reboot? I'm asking this because in a real-life 10 MAPs mesh (with a single Mesh Gate) and 40-50 people associated with, I see that internet browsing starts quickly then stop dead and continues a less bit quickly. I do not know if it depends on a path-switch (since its metric degrades, see hwmp ping-pong effect) or just on a transmit buffer overrun (AR93XX resets ?) due to the heavy traffic. Anyway, no devices reboot as been observed.