gravitl / netmaker

Netmaker makes networks with WireGuard. Netmaker automates fast, secure, and distributed virtual networks.
https://netmaker.io
Other
9.57k stars 553 forks source link

[Bug]: Brand new installation of v0.21.0 results in a network of hosts that cannot talk to each other, except the Netmaker default host. #2594

Open Aeoran opened 1 year ago

Aeoran commented 1 year ago

Contact Details

aeoran@gmail.com

What happened?

A brand new installation of Netmaker v0.21.0 and netclients v0.21.0 on a network known to be working with Netmaker/netclient v0.20.5 and prior, now no longer works.

Immediate sign is that the Netmaker UI shows quite persistent "error reaching broker" errors immediately after installation. Magically this seems to get resolved after some time left alone, which is immediately very suspicious. I was able to proceed to configure a network, and quickly verify that the hosts on the network were able to ping each other.

After a couple of hours idle, the system automatically went nuts and the hosts were no longer able to ping or communicate with each other, except between the Netmaker host and each host on the network. The TURN server logs show a complete failure in TURN. Overnight, this generated about 1G of logs. Ouch.

I'm not sure why the architecture is going towards STUN, TURN, etc. - but far from improving connectivity, Netmaker is becoming much less stable and performant. This seems like a major step backwards not a step forwards. Something is quite broken here.

Version

v0.21.0

What OS are you using?

No response

Relevant log output

docker logs turn

turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:43463: no allocation found 206.174.182.90:43463:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:55070: no allocation found 206.174.182.90:55070:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:55070: no allocation found 206.174.182.90:55070:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:55070: no allocation found 206.174.182.90:55070:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:53242: no allocation found 206.174.182.90:53242:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:53242: no allocation found 206.174.182.90:53242:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:35968: no allocation found 206.174.182.90:35968:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:35968: no allocation found 206.174.182.90:35968:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:33754: no allocation found 206.174.182.90:33754:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:43463: no allocation found 206.174.182.90:43463:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:45803: no allocation found 206.174.182.90:45803:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:41533: no allocation found 206.174.182.90:41533:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:52578: no allocation found 206.174.182.90:52578:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle Send-indication from 206.174.182.90:52578: no allocation found 206.174.182.90:52578:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle Send-indication from 206.174.182.90:35087: no allocation found 206.174.182.90:35087:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle Send-indication from 206.174.182.90:35087: no allocation found 206.174.182.90:35087:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:52578: no allocation found 206.174.182.90:52578:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:53242: no allocation found 206.174.182.90:53242:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:35087: no allocation found 206.174.182.90:35087:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:60311: no allocation found 206.174.182.90:60311:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle Send-indication from 206.174.182.90:60311: no allocation found 206.174.182.90:60311:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:43463: no allocation found 206.174.182.90:43463:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:41533: no allocation found 206.174.182.90:41533:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:49227: no allocation found 206.174.182.90:49227:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:56555: no allocation found 206.174.182.90:56555:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle Send-indication from 206.174.182.90:33754: no allocation found 206.174.182.90:33754:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:33754: no allocation found 206.174.182.90:33754:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:35780: no allocation found 206.174.182.90:35780:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle Send-indication from 206.174.182.90:43463: no allocation found 206.174.182.90:43463:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:52578: no allocation found 206.174.182.90:52578:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle Send-indication from 206.174.182.90:53242: no allocation found 206.174.182.90:53242:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:53242: no allocation found 206.174.182.90:53242:[::]:3479
turn ERROR: 2023/09/19 19:34:43 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:43463: no allocation found 206.174.182.90:43463:[::]:3479
turn ERROR: 2023/09/19 19:34:43 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:60311: no allocation found 206.174.182.90:60311:[::]:3479
turn ERROR: 2023/09/19 19:34:43 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:33754: no allocation found 206.174.182.90:33754:[::]:3479
turn ERROR: 2023/09/19 19:34:43 error when handling datagram: failed to handle Send-indication from 206.174.182.90:35780: no allocation found 206.174.182.90:35780:[::]:3479
turn ERROR: 2023/09/19 19:34:43 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:43463: no allocation found 206.174.182.90:43463:[::]:3479

Contributing guidelines

pernetz commented 1 year ago

If you reboot your server, do the log entries stop spaming? That's what happened on my VPS.

Aeoran commented 1 year ago

If you reboot your server, do the log entries stop spaming? That's what happened on my VPS.

No, unfortunately.

Aeoran commented 1 year ago

Reverting the TURN server from 1.1 to 1.0 makes the system functional. The installation from scratch also completes without errors from the UI that the broker is unreachable.

However, the TURN server still shows lots of the same spamming.

ThinkontrolSY commented 1 year ago

@Aeoran

Reverting the TURN server from 1.1 to 1.0 makes the system functional. The installation from scratch also completes without errors from the UI that the broker is unreachable.

However, the TURN server still shows lots of the same spamming.

Hi, I have the same issue with turn docker image: gravitl/turnserver:v1.0.0 the same spamming and hosts that cannot talk to each other, except the Netmaker default host.

Could you please share your docker-compose.yml ?

Aeoran commented 1 year ago

`version: "3.4"

services:

netmaker: container_name: netmaker image: gravitl/netmaker:$SERVER_IMAGE_TAG env_file: ./netmaker.env restart: always volumes:

volumes: caddy_data: { } # runtime data for caddy caddy_conf: { } # configuration file for Caddy sqldata: { } dnsconfig: { } # storage for coredns mosquitto_logs: { } # storage for mqtt logs mosquitto_data: { } # storage for mqtt data turn_server: { } `

shandralor commented 1 year ago

Had this issue also. Deleted all my docker containers and reran the script. It does work now

Aeoran commented 1 year ago

Which containers are you using?  Newer than 0.21.0?

⁣-- Joseph​

On Oct. 19, 2023, 23:25, at 23:25, Tom Teck @.***> wrote:

Had this issue also. Deleted all my docker containers and reran the script. It does work now

-- Reply to this email directly or view it on GitHub: https://github.com/gravitl/netmaker/issues/2594#issuecomment-1772018050 You are receiving this because you were mentioned.

Message ID: @.***>

jgoclawski commented 1 year ago

I think the default docker-compose.yml uses wrong path for mounting Turn container files. This means that there's no data persistence and creating new container wipes history. Which results in the logs that you observe.

Try changing:

volumes:
- turn_server:/etc/config

To:

volumes:
- turn_server:/root/etc/turn/config

Unfortunately, old data is already lost, so only newly connected clients will benefit from this.