sudomesh / sudowrt-firmware

Scripts to build the sudo mesh OpenWRT firmware.
Other
73 stars 19 forks source link

notdhcp down script fail? #59

Closed Juul closed 8 years ago

Juul commented 9 years ago

An extender node was plugged into a home node, then plugged into another home node without being rebooted. It ended up having two IPs (it came back up with both the IP it got from the old home node and the one it got from the new home node).

Juul commented 9 years ago

Hm, this might actually have been due to another problem: Both the notdhcp client and server do not run their down scripts when they are cleanly shut down. They should check their state on sigterm and run the down script if appropriate.

max-b commented 9 years ago

Looks like this is dependent on: https://github.com/sudomesh/notdhcp/issues/5

max-b commented 9 years ago

So I've got this implemented in https://github.com/sudomesh/notdhcp/commit/fc4035ccac06f41e261a8919e74cfd78f409946b

However, if one side of the notdhcp transaction quits, the other side doesn't know about it until there's a disconnect. So if the client were to restart the daemon, the server wouldn't reset to initial state and it'd be stuck again.

The simplest solution would be to bring the interface down and then back up, which would trigger the other side to run its down script.

I suppose we could also consider sending a special "reset" message...

max-b commented 8 years ago

The simplest solution would be to bring the interface down and then back up, which would trigger the other side to run its down script. Nope that's not actually going to trigger a netlink message on the other side.

Another option we considered was just doing a reboot on the notdhcp down hook script. That would trigger a netlink message because it would physically change power state to the other end. I'm not in love with this idea because it would require the home node in particular, to reboot any time the extender node notdhcp quits.

The "reset" message idea is not perfect, but it's probably workable. The current code relies generally on the clients sending requests to servers, which means that adding additional a "reset" would mean that the clients would need significant additional listening capabilities and servers would need significant additional sending capabilities.

One additional idea is to add a "heartbeat" where the clients send heartbeats and listening for ACKs. If the server doesn't hear a "heartbeat" request after a certain amount of time, it considers the communication timed out, runs the down script, and returns to the initial listening state. If the client sends heartbeats and doesn't receive an ACK after a timeout period, it does the same thing.

max-b commented 8 years ago

I closed the ticket on notdhcp here: https://github.com/sudomesh/notdhcp/issues/5#issuecomment-160049946

We've implemented a heartbeat/timeout protocol.

I've also made sure to catch any appropriate signals and run the down hook when appropriate.