gravitl / netmaker

Netmaker makes networks with WireGuard. Netmaker automates fast, secure, and distributed virtual networks.
https://netmaker.io
Other
9.4k stars 547 forks source link

netmaker docker failed to start after removing local range from network config #455

Closed chefkoch-de42 closed 2 years ago

chefkoch-de42 commented 2 years ago

Hi, my netmaker docker container keeps restarting after I deleted the localrange from network config.

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 302fe1b4ed5e gravitl/netmaker-ui:v0.8.5 "/docker-entrypoint.…" 31 hours ago Up 56 seconds 0.0.0.0:8082->80/tcp, :::8082->80/tcp netmaker-ui d458b800a6c2 coredns/coredns "/coredns -conf /roo…" 31 hours ago Up 56 seconds 127.0.0.153:53->53/tcp, 127.0.0.153:53->53/udp coredns 715d7f5c9e7c gravitl/netmaker:v0.8.5 "./netmaker" 31 hours ago Restarting (2) 2 seconds ago netmaker 8b6043450594 caddy:latest "caddy run --config …" 34 hours ago Up 59 seconds caddy

If I check the netmaker logs I see the following logs, which are generated for every restart of netmaker container

`2021/11/11 15:11:51 [netmaker] connecting to sqlite 2021/11/11 15:11:51 [netmaker] database successfully connected 2021/11/11 15:11:51 [netmaker] no OAuth provider found or not configured, continuing without OAuth 2021/11/11 15:11:51 [netmaker] Agent Server successfully started on port 50051 (gRPC) panic: runtime error: index out of range [0] with length 0

goroutine 36 [running]: github.com/gravitl/netmaker/logic.setServerPeers({0xc00039d223, 0x7}, 0x14, {0xc0004d8180, 0x4, 0xc00039d22c}) /app/logic/wireguard.go:241 +0xe85 github.com/gravitl/netmaker/logic.setWGConfig({{0xc00023e630, 0x18}, {0xc00039d1f0, 0xa}, {0x0, 0x0}, {0x0, 0x0}, {0xc00039d200, 0x8}, ...}, ...) /app/logic/wireguard.go:64 +0x17d github.com/gravitl/netmaker/logic.ServerPull(0xc00049d080, 0x0) /app/logic/server.go:204 +0x30c github.com/gravitl/netmaker/logic.ServerCheckin({0xc00003a068, 0x11}, {0xc00003fcdc, 0x4}) /app/logic/server.go:157 +0x158 github.com/gravitl/netmaker/serverctl.HandleContainedClient() /app/serverctl/serverctl.go:103 +0x12d main.runClient.func1() /app/main.go:130 +0x19 created by main.runClient /app/main.go:128 +0x59`

Any idea to it back working?

afeiszli commented 2 years ago

We will need to add a check in the next release to make sure this doesnt happen. In the meantime, it looks like either the AllowedIP's on the peer in the server, or in the host are missing, This is the line that fails:

        if currentPeer.AllowedIPs[0].String() == peer.AllowedIPs[0].String() &&

run "docker exec -it netmaker wg show" and make sure all peers have allowed IPs. If not, remove the peer manually. If the peer is missing an allowed IP on the server (look in the UI), then add it in using the UI.

chefkoch-de42 commented 2 years ago

I was able to get response from netmaker container in between the repeating restarts. The wireguard config seem to be in place.

`interface: nm-test public key: QB57SNftT85la8bYC private key: (hidden) listening port: 41821

peer: tmakuIl5KZuDZYMK2nD70 endpoint: XXXXXXXXXX:41821 allowed ips: 10.88.88.2/32, 192.168.1.0/24 latest handshake: 36 seconds ago transfer: 299.02 KiB received, 307.48 KiB sent persistent keepalive: every 20 seconds

peer: MzHFDq+WnHlGzgd3TBw endpoint: XXXXXXXXXX:62028 allowed ips: 10.88.88.5/32 latest handshake: 37 seconds ago transfer: 9.84 KiB received, 251.61 KiB sent persistent keepalive: every 20 seconds

peer: Je5nVxezJpvKAp2QTn endpoint: XXXXXXXXXX:48298 allowed ips: 10.88.88.6/32 latest handshake: 1 minute, 21 seconds ago transfer: 6.18 KiB received, 35.55 KiB sent persistent keepalive: every 20 seconds

peer: H4LhtPJqtHkYFnAV0 endpoint: XXXXXXXXXX:63064 allowed ips: (none) latest handshake: 1 minute, 36 seconds ago transfer: 6.48 KiB received, 1.62 KiB sent persistent keepalive: every 20 seconds`

but I see exactly the same if I run wg show o server with all docker container stopped.

afeiszli commented 2 years ago

I see exactly the problem, where one peer is missing an allowed ip. You must remove it.

docker exec -it netmaker wg set nm-test peer H4LhtPJqtHkYFnAV0 remove

How did the peer's IP disappear like that?

chefkoch-de42 commented 2 years ago

Hmmm, I just removed the local net config from the network config.

Just restarted the docker containsers again. The just removed peer shows up directly and again without any allowed ip. And it keeps restartting with the know error.

chefkoch-de42 commented 2 years ago

Very strange. I was able to get the gui working while running watch -n 0.1 "docker exec -it netmaker wg set nm-test peer H4LhtPJqtHkYFnAV0Ajz8jTUeubL3Ru7nD0VR88UZkA= remove"

and deleted the node in the gui. Now it seems to work again, but something mustbe wrong in DB for automatic node ip assignement. When I add a new client, it gets ips started from 1, nevertheless if there is netmaker or any other host running

chefkoch-de42 commented 2 years ago

I deleted the network completely and started from scratch. BTW it was a local network with not local clients in it.

Should we close the issue now?

afeiszli commented 2 years ago

Lets keep this open. We shouldn't allow any peers to exist with no AllowedIPs. We also need to error handle the case when they arise. Node IP assignment should never take an IP that is already assigned, so we also need to look into how that could happen.

afeiszli commented 2 years ago

This issue should be fixed as of 0.9.4. Please let us know if you're still encountering the issue and we can re-open.