webmeshproj / webmesh

A simple, distributed, zero-configuration WireGuard mesh solution
https://webmeshproj.github.io
Apache License 2.0
425 stars 16 forks source link

"snapshot restore progress" doesn't seem to progress #7

Closed bbigras closed 1 year ago

bbigras commented 1 year ago

If I run this command, stop it and run it again:

  sudo webmesh-node \
       --global.detect-endpoints \
       --global.mtls \
       --global.tls-cert-file=/opt/webmesh/tls.crt \
       --global.tls-key-file=/opt/webmesh/tls.key \
       --global.tls-ca-file=/opt/webmesh/ca.crt \
       --bootstrap.enabled \
       --bootstrap.default-network-policy=accept \
       --wireguard.listen-port 51821 \
       --global.primary-endpoint 159.203.12.215 \
       --global.no-ipv6

I get:

{"time":"2023-08-08T16:00:51.644533801-04:00","level":"INFO","msg":"starting raft instance","component":"raft","storage":"/var/lib/webmesh/store","listen-addr":"[::]:9443"}
{"time":"2023-08-08T16:00:51.645221742-04:00","level":"INFO","msg":"starting restore from snapshot","component":"raft","id":"2-26-1691524849967","last-index":26,"last-term":2,"size-in-bytes":824}
{"time":"2023-08-08T16:01:01.645775939-04:00","level":"INFO","msg":"snapshot restore progress","component":"raft","id":"2-26-1691524849967","last-index":26,"last-term":2,"size-in-bytes":824,"read-bytes":0,"percent-complete":["%0.2f%%",0]}
{"time":"2023-08-08T16:01:11.645899269-04:00","level":"INFO","msg":"snapshot restore progress","component":"raft","id":"2-26-1691524849967","last-index":26,"last-term":2,"size-in-bytes":824,"read-bytes":0,"percent-complete":["%0.2f%%",0]}
{"time":"2023-08-08T16:01:21.646471945-04:00","level":"INFO","msg":"snapshot restore progress","component":"raft","id":"2-26-1691524849967","last-index":26,"last-term":2,"size-in-bytes":824,"read-bytes":0,"percent-complete":["%0.2f%%",0]}
{"time":"2023-08-08T16:01:31.645518945-04:00","level":"INFO","msg":"snapshot restore progress","component":"raft","id":"2-26-1691524849967","last-index":26,"last-term":2,"size-in-bytes":824,"read-bytes":0,"percent-complete":["%0.2f%%",0]}
[...]

I'm guessing it should be near instant.

tinyzimmer commented 1 year ago

Yea it should be, I'll take a look

tinyzimmer commented 1 year ago

You are my new best friend, by the way. These are the types of bugs I need to find before I can call this thing production ready.

You've found a deadlock in the restore process. I wasn't aware that the raft library would automatically attempt the restore itself, and I'm holding a lock when it attempts that. Trying out a fix.

bbigras commented 1 year ago

You are my new best friend, by the way

hehe same. I really like the idea of webmesh.

tinyzimmer commented 1 year ago

Same as before - main should fix your issue, I'll be a little zealous and do two patch releases tonight :smile:

bbigras commented 1 year ago

It works! Thanks!!

no hurries. nixpkgs doesn't have go 1.21 merged yet.

tinyzimmer commented 1 year ago

An extra thing I'll call out - and it's making me wonder if this should remain the default behavior or not - is unless you specify a --wireguard.key-file - you'll generate a fresh one on each boot (you can specify it at a non-existant path and it will generate it for you the first time).

With the way I see you doing this - that could affect someone waiting for a voting/leader node to just "reappear". If it comes back with a different key, they the other node will try to reconnect to it using the old one still. It will only get notified of key changes when it has a leader it can chat with :stuck_out_tongue: .

Regarding the Go version. Yea that literally came out a few hours ago. But I was really excited to move all the structured logging to the new built-in log/slog package. I probably should have waiting a day. That's how long it usually takes for package maintainers to catch up.