gravitl / netmaker

Netmaker makes networks with WireGuard. Netmaker automates fast, secure, and distributed virtual networks.
https://netmaker.io
Other
9.4k stars 547 forks source link

[Bug]: Untrapped error during initialization of Netmaker server results in an unusable system #1011

Closed vitex closed 2 years ago

vitex commented 2 years ago

Contact Details

What happened?

This issue is based on a discussion that I started on the Discord server channel on April 3.

While trying to get a netclient back online on a v0.12.x network, I restarted the netmaker server and encountered a fatal error that kept netmaker from activating.

# docker-compose down
...
# docker-compose up
...
netmaker       | [netmaker] 2022-04-03 14:32:26 database successfully connected 
netmaker       | [netmaker] 2022-04-03 14:32:26 no OAuth provider found or not configured, continuing without OAuth 
netmaker       | [netmaker] 2022-04-03 14:32:29 could not set peers on network 7311 : file does not exist 
netmaker       | panic: runtime error: index out of range [0] with length 0
netmaker       | 
netmaker       | goroutine 1 [running]:
netmaker       | github.com/gravitl/netmaker/netclient/wireguard.SetPeers({0xc000291b32, 0x7}, 0xc0000be900, {0xc00026f000, 0xc00026f000, 0x20})
netmaker       |     /app/netclient/wireguard/common.go:60 +0x1305
netmaker       | github.com/gravitl/netmaker/logic.initWireguard(0xc0000be900, {0xc000547b90, 0x2c}, {0xc00026f000, 0x12, 0x20}, 0x0, {0x0, 0x0, 0x0})
netmaker       |     /app/logic/wireguard.go:246 +0x121a
netmaker       | github.com/gravitl/netmaker/logic.setWGConfig(0xc0000be900, 0x0)
netmaker       |     /app/logic/wireguard.go:296 +0x1fc
netmaker       | github.com/gravitl/netmaker/logic.ServerPull(0xc0000be900, 0x1)
netmaker       |     /app/logic/server.go:431 +0x26c
netmaker       | github.com/gravitl/netmaker/serverctl.InitServerNetclient()
netmaker       |     /app/serverctl/serverctl.go:103 +0x2d7
netmaker       | main.initialize()
netmaker       |     /app/main.go:86 +0x42d
netmaker       | main.main()
netmaker       |     /app/main.go:35 +0xaa
netmaker exited with code 2
...

Is there any way to work around the error and get netmaker server working again without having to rebuild after uninstalling the current server and all its networks and clients?

Since no method to recover the server has been found, I will have to reinstall the server, all of its networks, and all of clients on each network. (Leaving a comms network node without a corresponding server that is listening would result in Issue 862.)

The issue is not the particular error that occurred on my test system. Any untrapped runtime error during initialization of Netmaker server would be catastrophic on a production system since it would permanently prevent interaction with the UI and thus with all of the clients on all of the networks connected to that server.

Version

v0.12.1

What OS are you using?

Linux

Relevant log output

No response

Contributing guidelines

vitex commented 2 years ago

On 4/28 I used v0.13.0 to recreate my network with 20+ nodes; initial testing went well. When resumed testing the next day, the Netmaker UI gave me a prompt to create an admin key instead of sign in to an existing server and responded with a "Could not reach server" message. When I used docker-compose down followed by docker-compose up, the log file showed a similar problem to what I saw with v0.12.x.

vitex commented 2 years ago

The untrapped runtime error still occurs in v0.13.1.

vitex commented 2 years ago

The untrapped runtime error still occurs in v0.14.0 as is shown by this log file.

0xdcarns commented 2 years ago

Could you detail your steps to create this issue?

vitex commented 2 years ago

On Tue, May 17, 2022 at 5:53 PM dcarns @.***> wrote:

Could you detail your steps to create this issue?

I wish that the situation were that simple.

I have been experimenting with Netmaker since v0.9.x by building a network with 15 to 20 nodes. While using v0.12.x I encountered the untrapped error in Netmaker server that made my network unusable since the dashboard did not activate. When I did not get any help on Discord, I wiped the network and recreated it with v0.13.0. After less than 24 hours of usage, the network failed with the same untrapped error. I saved the docker volumes for the v0.13.0 network and tried again with the v0.14.0 version of the server, but I had the same untrapped error.

Without the dashboard to investigate the state of my network, I do not know what is the cause of the error. The only unusual thing that I was doing was allocating a client on the same node as the server, but I created a separate smaller network with a client on the server node, and that network has been running without problems for several weeks. I tried modifying a copy of the SQLite database of the original v0.13.0 system to remove all references to the client on the server, but that did not prevent the server from crashing.

The error message in the log file is

netmaker       | panic: runtime error: index out of range [0] with length 0
netmaker       |
netmaker       | goroutine 1 [running]:
netmaker       |
github.com/gravitl/netmaker/netclient/wireguard.SetPeers({0xc00002f594,
0x7}, 0xc000047400, {0xc0001ea800?, 0x11, 0x20})
netmaker       | /app/netclient/wireguard/common.go:45 +0x10df

which is apparently the test

currentPeer.AllowedIPs[0].String() == peer.AllowedIPs[0].String()

at line 45 of the file.

Until this runtime error is trapped, I cannot use the dashboard to investigate what it is about my v0.13.0 network that resulted in the untrapped runtime error.

Ed

afeiszli commented 2 years ago

Untrapped error fixed in v0.14.1