webmeshproj / webmesh

A simple, distributed, zero-configuration WireGuard mesh solution
https://webmeshproj.github.io
Apache License 2.0
420 stars 16 forks source link

can't recover after deleting /var/lib/webmesh while running #10

Closed bbigras closed 1 year ago

bbigras commented 1 year ago

If I delete /var/lib/webmesh by mistake while webmesh-node is running, I can't kill the node unless I use -9.

And if I start webmesh-node again, I get Error: failed to open mesh connection: join: fatal join error starting network manager: new wireguard: new interface: new tun: create tun: invalid argument.

I can start the node again if I run sudo ip link delete webmesh0.

tinyzimmer commented 1 year ago

So the second issue is a byproduct of the first issue. With a clean shutdown - it will attempt to remove the interface for you. There is the --wireguard.force-interface-name flag/config that will delete it on startup if it sees it. You can also customize the name of the interface. The Join error was it not caring what the kernel error was and trying to fall back to TUN, which I guess was not available wherever this was running.

I don't know exactly what that first issue was - but I'd be curious for logs. It feels pretty unavoidable, but maybe can at least make a clean exit happen. You took its data directory out from under it. On nodes where you don't care about persisting data locally you can use the --raft.in-memory flag.

tinyzimmer commented 1 year ago

Just tidying things up. Gonna close this issue for now. If you happen to think of any ways disappearing storage can be dealt with, feel free to open a new issue.

Worth noting, that I hope to revamp a lot of how the storage works in favor of a distributed implementation.