Closed max-b closed 8 years ago
I think that this commit will help: cbbaf1be035aecdac71e016ab143ad463e0d919c
I added a /etc/init.d/tunneldigger restart
to the udhcpc.user
script which gets called after the node gets a new dhcp lease
I actually fancied that udhcpc.user
script to test for a connection and if not found then run /etc/init.d/meshrouting restart
It could be the basis of a mesh watchdog....
I've created a mesh-watchdog script which just pings our exit server on a preset occasion and runs a restart script after a certain period of time. I haven't set it up on all of the nodes, because it could be a problem if someone is intentionally creating a small mesh situation where they don't intend for it to access the publicly accessible exit server.
Out of our alpha test devices, we have a couple that have shown significant downtimes, while others seem to have almost perfect uptimes: http://monitor.sudomesh.org/smokeping/smokeping.cgi?target=Mesh
Unfortunately, what we really need is to talk to the affected folks and find out more about their situations and why they fail. In certain circumstances it seems like folks have just actually unplugged their router, which means that this isn't really a technical fail as much as it's a logistics/community fail.
The ar71xx devices also have a hardware watchdog.
We haven't had this problem really for the last month or so. I think that the udhcpc.user
script is a decent fix for now. @Juul is working on a hardware "watchpuppy" and I wrote a ping watchdog that I've been testing on a handful of nodes: https://github.com/sudomesh/sudowrt-packages/tree/master/net/mesh-watchdog
Some set of our nodes seem to be occasionally losing connectivity to the mesh. It's only 3 of them, and the rest have had perfect uptime, so I think that I may have somehow pushed a bad firmware to them, probably somehow related to tunneldigger and or a hook script.
Ideally, we would be able to get some debug info from one of these disconnected devices to figure out what exactly is going on. If someone were to be able to connect to a device that has lost mesh connectivity and run the following commands, it would be very helpful:
logread
ip addr
ip link
ip route
ip route show table public
ps
ip rule list
kill -USR1 $(pgrep babeld); cat /var/log/babeld.log
ping -c 3 8.8.8.8
traceroute 8.8.8.8
iptables -L -v -n
iptables -L -v -n -t nat