Closed cocoonkid closed 9 months ago
can you provide the output of
wg show
from the two nodes
@mattkasun My apology for the late reply.
I wanted to first make sure all nodes are on the latest version (v0.20.6)and all ACL's are correct e.g each node is allowed to reach all other nodes.
Node 1 is behind nat and node 2 is behinde the same nat as well. They get connected through the nm server on the public internet.
Observations:
Node 1 (Arch Linux netclient) /etc/hosts is missing the dns entries for the 10.254.138.0\24 network . The node belongs to 10.254.137.0/24 and 10.254.138/0/24.
wg show output for node1: (i've omitted the peers from 10.254.137.0/24 but they are all visible and ping works )
interface: netmaker public key: NOSj5+JNhmf8ve5RWke2rtAOLfm1jS5gHXZ+9KmWcAc= private key: (hidden) listening port: 51821
peer: Z8OvEyTLQ5bsE6oBDVqqemZ62ZVX3s9xer3hCoYv6Qs= endpoint: x.x.x.x.x.x.x:62688 allowed ips: 10.254.138.1/32 transfer: 0 B received, 296 B sent persistent keepalive: every 20 seconds
peer: 2hxcDIxrA7Yf3ufAc5oHbg8HIML16OWZKJ826tfj7hU= endpoint: x.x.x.x.x.x:26730 allowed ips: 10.254.138.3/32 transfer: 0 B received, 296 B sent persistent keepalive: every 20 seconds
Node 2 (OpenWRT arm64 netclient) belongs to only 10.254.138/0/24.. wg show output for node2:
interface: netmaker public key: Z8OvEyTLQ5bsE6oBDVqqemZ62ZVX3s9xer3hCoYv6Qs= private key: (hidden) listening port: 51822
peer: 2hxcDIxrA7Yf3ufAc5oHbg8HIML16OWZKJ826tfj7hU= endpoint:x.x.x..x:26730 allowed ips: 10.254.138.3/32 transfer: 0 B received, 157.83 KiB sent persistent keepalive: every 20 seconds
peer: NOSj5+JNhmf8ve5RWke2rtAOLfm1jS5gHXZ+9KmWcAc= endpoint: x.x.x.x..x:57592 allowed ips: 10.254.138.2/32 transfer: 0 B received, 157.68 KiB sent persistent keepalive: every 20 seconds
So to paraphrase:
The same host in two different networks works only in the network first joined.
In the second network nm shows everything is fine but I cannot ping the the nodes of the 2nd network.
,Are both networks on the same netmaker server?
Yes! The netmaker server is actually also a node of 10.254.137.0/24.
I think I found the real problem but not yet the solution, find attached my docker-compose and some logs:
Aug 24 13:13:12 arch netclient[3931963]: [netclient] 2023-08-24 13:13:12 Starting firewall...
Aug 24 13:13:12 arch netclient[3931963]: [netclient] 2023-08-24 13:13:12 iptables is supported
Aug 24 13:13:12 arch netclient[3931963]: [netclient] 2023-08-24 13:13:12 adding forwarding rule
Aug 24 13:13:13 arch netclient[3931963]: completed pull for server <<domain>>
Aug 24 13:13:13 arch netclient[3931963]: [netclient] 2023-08-24 13:13:13 adding addresses to netmaker interface
Aug 24 13:13:13 arch netclient[3931963]: [netclient] 2023-08-24 13:13:13 initialized endpoint detection on port 60016
Aug 24 13:13:13 arch netclient[3931963]: [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.
Aug 24 13:13:13 arch netclient[3931963]: [GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
Aug 24 13:13:13 arch netclient[3931963]: - using env: export GIN_MODE=release
Aug 24 13:13:13 arch netclient[3931963]: - using code: gin.SetMode(gin.ReleaseMode)
Aug 24 13:13:13 arch netclient[3931963]: [GIN-debug] GET /status --> github.com/gravitl/netclient/functions.status (3 handlers)
Aug 24 13:13:13 arch netclient[3931963]: [GIN-debug] POST /register --> github.com/gravitl/netclient/functions.register (3 handlers)
Aug 24 13:13:13 arch netclient[3931963]: [GIN-debug] GET /network/:net --> github.com/gravitl/netclient/functions.getNetwork (3 handlers)
Aug 24 13:13:13 arch netclient[3931963]: [GIN-debug] GET /allnetworks --> github.com/gravitl/netclient/functions.getAllNetworks (3 handlers)
Aug 24 13:13:13 arch netclient[3931963]: [GIN-debug] GET /netclient --> github.com/gravitl/netclient/functions.getNetclient (3 handlers)
Aug 24 13:13:13 arch netclient[3931963]: [GIN-debug] POST /connect/:net --> github.com/gravitl/netclient/functions.connect (3 handlers)
Aug 24 13:13:13 arch netclient[3931963]: [GIN-debug] POST /leave/:net --> github.com/gravitl/netclient/functions.leave (3 handlers)
Aug 24 13:13:13 arch netclient[3931963]: [GIN-debug] GET /servers --> github.com/gravitl/netclient/functions.servers (3 handlers)
Aug 24 13:13:13 arch netclient[3931963]: [GIN-debug] POST /uninstall --> github.com/gravitl/netclient/functions.uninstall (3 handlers)
Aug 24 13:13:13 arch netclient[3931963]: [GIN-debug] GET /pull/:net --> github.com/gravitl/netclient/functions.pull (3 handlers)
Aug 24 13:13:13 arch netclient[3931963]: [GIN-debug] POST /nodepeers --> github.com/gravitl/netclient/functions.nodePeers (3 handlers)
Aug 24 13:13:13 arch netclient[3931963]: [GIN-debug] POST /join --> github.com/gravitl/netclient/functions.join (3 handlers)
Aug 24 13:13:13 arch netclient[3931963]: [GIN-debug] POST /sso --> github.com/gravitl/netclient/functions.sso (3 handlers)
Aug 24 13:13:14 arch netclient[3931963]: [netclient] 2023-08-24 13:13:14 publishing global host update for endpoint changes
Aug 24 13:13:23 arch netclient[3931963]: [netclient] 2023-08-24 13:13:23 Failed to allocate: all retransmissions failed for 7wSsgKWhLSNQsTyA
Aug 24 13:13:23 arch netclient[3931963]: [netclient] 2023-08-24 13:13:23 failed to allocate addr on turn: all retransmissions failed for 7wSsgKWhLSNQsTyA
services:
netmaker:
container_name: netmaker
image: gravitl/netmaker:$SERVER_IMAGE_TAG
env_file: ./.env
restart: always
volumes:
- /data/manifest_data/netmaker__coredns/dnsconfig:/root/config/dnsconfig
- /data/manifest_data/netmaker__sqldata:/root/data
environment:
# config-dependant vars
- STUN_LIST=stun1.netmaker.io:3478,stun2.netmaker.io:3478,stun1.l.google.com:19302,stun2.l.google.com:19302
- BROKER_ENDPOINT=wss://broker.${NM_DOMAIN}
- SERVER_NAME=${NM_DOMAIN}
- SERVER_API_CONN_STRING=nmapi.${NM_DOMAIN}:443
- COREDNS_ADDR=${SERVER_HOST}
- SERVER_HTTP_HOST=nmapi.${NM_DOMAIN}
- TURN_SERVER_HOST=turn.${NM_DOMAIN}
- TURN_SERVER_API_HOST=https://turnapi.${NM_DOMAIN}
netmaker-ui:
container_name: netmaker-ui
image: gravitl/netmaker-ui:$UI_IMAGE_TAG
env_file: ./.env
environment:
# config-dependant vars
# URL where UI will send API requests. Change based on SERVER_HOST, SERVER_HTTP_HOST, and API_PORT
BACKEND_URL: "https://nmapi.${NM_DOMAIN}"
depends_on:
- netmaker
links:
- "netmaker:api"
restart: always
caddy:
image: caddy:2.6.2
container_name: caddy
env_file: ./.env
restart: unless-stopped
extra_hosts:
- "host.docker.internal:host-gateway"
volumes:
- /data/manifest_data/netmaker__caddy/Caddyfile:/etc/caddy/Caddyfile
- /data/manifest_data/netmaker__caddy/certs:/root/certs
- /data/manifest_data/netmaker__caddy/data:/data
- /data/manifest_data/netmaker__caddy/conf:/config
ports:
- "80:80"
- "443:443"
coredns: # https://github.com/coredns/coredns/issues/6249 currently not latest
container_name: coredns
image: coredns/coredns:1.10.1
command: -conf /root/dnsconfig/Corefile
env_file: ./.env
depends_on:
- netmaker
restart: always
volumes:
- /data/manifest_data/netmaker__coredns/dnsconfig:/root/dnsconfig
ports:
- "53:53/udp"
mq:
container_name: mq
image: eclipse-mosquitto:2.0.15-openssl
env_file: ./.env
depends_on:
- netmaker
restart: unless-stopped
command: [ "/mosquitto/config/wait.sh" ]
volumes:
- /data/manifest_data/netmaker__mosquitto/mosquitto.conf:/mosquitto/config/mosquitto.conf
- /data/manifest_data/netmaker__mosquitto/wait.sh:/mosquitto/config/wait.sh
- /data/manifest_data/netmaker__mosquitto/data:/mosquitto/data
- /data/log_data/netmaker__mosquitto/mosquitto_logs:/mosquitto/log
turn:
container_name: turn
image: gravitl/turnserver:v1.0.0
env_file: ./.env
environment:
- DEBUG_MODE="off"
# config-dependant vars
- USERNAME=${TURN_USERNAME}
- PASSWORD=${TURN_PASSWORD}
# domain for your turn server
- TURN_SERVER_HOST=turn.${NM_DOMAIN}
network_mode: "host"
volumes:
- /data/manifest_data/netmaker__turn_server/config:/etc/config
restart: always
nat'ed client:
Aug 24 13:13:13 arch netclient[3931963]: [GIN-debug] POST /nodepeers --> github.com/gravitl/netclient/functions.nodePeers (3 handlers)
Aug 24 13:13:13 arch netclient[3931963]: [GIN-debug] POST /join --> github.com/gravitl/netclient/functions.join (3 handlers)
Aug 24 13:13:13 arch netclient[3931963]: [GIN-debug] POST /sso --> github.com/gravitl/netclient/functions.sso (3 handlers)
Aug 24 13:13:14 arch netclient[3931963]: [netclient] 2023-08-24 13:13:14 publishing global host update for endpoint changes
Aug 24 13:13:23 arch netclient[3931963]: [netclient] 2023-08-24 13:13:23 Failed to allocate: all retransmissions failed for 7wSsgKWhLSNQsTyA
Aug 24 13:13:23 arch netclient[3931963]: [netclient] 2023-08-24 13:13:23 failed to allocate addr on turn: all retransmissions failed for 7wSsgKWhLSNQsTyA
Aug 24 13:38:43 arch netclient[3931963]: [netclient] 2023-08-24 13:38:43 could not connect to broker at <<domain>>
Aug 24 13:38:43 arch netclient[3931963]: [netclient] 2023-08-24 13:38:43 error publishing checkin connection timeout
Aug 24 13:39:43 arch netclient[3931963]: [netclient] 2023-08-24 13:39:43 could not connect to broker at <<domain>>
Aug 24 13:39:43 arch netclient[3931963]: [netclient] 2023-08-24 13:39:43 error publishing checkin connection timeout
netmaker-sever:
turn | turn ERROR: 2023/08/24 11:42:59 error when handling datagram: failed to handle CreatePermission-request from x.x.x.x.x:52560: no allocation foundx.x.x.x:52560:[::]:3479
So the problem is actually with the TURN server. And yes I just confirmed by not using any NAT'ed clients. Then everything works great.
So what could I be doing wrong with the STUN/TURN.
I checked the wss connection to broker.domain.com and it looks good. The mqtt broker is also working.
Another important piece of information is that my router connects via LTE/5G. That means it does not have a "normal "public IP but type 3 NAT or carrier-grade NAT.
The public IP shown for the devices behind the cg-nat is thus wrong.
Could this be the reason it does not work?
But I tried zerotier and that works too.
I seem to be running into a similar issue where nodes within the same NAT cannot form a wireguard connection with each other.
This was not an issue in v0.20.3.
Just did a fresh install of the server. Deleted all netclient configs on all nodes. Issue still persists.
I did find this line in the netclient logs:
..."level":"ERROR","source":"mqhandlers.go 194}","msg":"error decrypting message","error":"received invalid message from broker []"}
I have the same issue
Same when trying to connect two hosts within one local network (NAT'ed) Any updates on this issue?
Same error as two comments above:
netclient[804053]: {"time":"2023-09-11T21:03:16.061210735+03:00","level":"ERROR","source":"mqhandlers.go 194}","msg":"error decrypting message","error":"received invalid message from broker []"}
I'm also having some issues with the Turn server behind NAT. Everything seems to work, but every ... seconds/minutes or so it will go offline for a few seconds, then I get this error on the netclients:
Not sure if this is 100% related, but let me know if you find a fix that works for you, thanks a lot
Any updates on this issue?
Same error as comments above: ..."level":"ERROR","source":"mqhandlers.go 194}","msg":"error decrypting message","error":"received invalid message from broker []"}
version 0.21.2
It looks my issues went away with the 0.21.2.
I will report if that changes.
Hello Team, I'm still facing same issue . from my one host i'm able to ping some hosts but at same time some hosts still not reachable.
Logs Dec 28 15:06:50 uptimekuma-server netclient[1205]: [GIN-debug] POST /uninstall --> github.com/gravitl/netclient/functions.uninstall (3 handlers) Dec 28 15:06:50 uptimekuma-server netclient[1205]: [GIN-debug] GET /pull/:net --> github.com/gravitl/netclient/functions.pull (3 handlers) Dec 28 15:06:50 uptimekuma-server netclient[1205]: [GIN-debug] POST /nodepeers --> github.com/gravitl/netclient/functions.nodePeers (3 handlers) Dec 28 15:06:50 uptimekuma-server netclient[1205]: [GIN-debug] POST /join --> github.com/gravitl/netclient/functions.join (3 handlers) Dec 28 15:06:50 uptimekuma-server netclient[1205]: [GIN-debug] POST /sso --> github.com/gravitl/netclient/functions.sso (3 handlers) Dec 28 15:06:51 uptimekuma-server netclient[1205]: {"time":"2023-12-28T15:06:51.052420309Z","level":"ERROR","source":"mqhandlers.go 194}","msg":"error decrypting message","error":"received invalid message from broker []"} Dec 28 15:06:51 uptimekuma-server netclient[1205]: {"time":"2023-12-28T15:06:51.183144542Z","level":"ERROR","source":"mqhandlers.go 194}","msg":"error decrypting message","error":"received invalid message from broker []"} Dec 28 15:06:51 uptimekuma-server netclient[1205]: [netclient] 2023-12-28 15:06:51 adding addresses to netmaker interface Dec 28 15:06:59 uptimekuma-server netclient[1205]: [netclient] 2023-12-28 15:06:59 Failed to allocate: all retransmissions failed for k52fOldrwjHXpwTD Dec 28 15:06:59 uptimekuma-server netclient[1205]: [netclient] 2023-12-28 15:06:59 failed to allocate addr on turn: all retransmissions failed for k52fOldrwjHXpwTD
PING 10.134.66.2 (10.134.66.2) 56(84) bytes of data. From 10.134.66.5 icmp_seq=1 Destination Host Unreachable ping: sendmsg: Required key not available
From 10.134.66.5 icmp_seq=2 Destination Host Unreachable ping: sendmsg: Required key not available From 10.134.66.5 icmp_seq=3 Destination Host Unreachable ping: sendmsg: Required key not available From 10.134.66.5 icmp_seq=4 Destination Host Unreachable ping: sendmsg: Required key not available
@Ankit1999Raikwar what version are you running?
I also had these turn errors in the turn and api part of netmaker. What I did to fix it was to delete the network_mode: "host"
in the docker compose for the turn service and just use the same network as the netmaker stack.
@Aceix I'm using the latest (v0.21.2) version. also, I have noticed when I connect two networks with one host. that time also from that hosts some of the other hosts are reachable and some are not. We are facing the same issue like https://github.com/gravitl/netmaker/issues/2518#issuecomment-1688111852
Additional Information.
When have set Access Control to deny output
root@uptimekuma-server:~# ping 10.173.132.3
PING 10.173.132.3 (10.173.132.3) 56(84) bytes of data.
From 10.173.132.5 icmp_seq=1 Destination Host Unreachable
ping: sendmsg: Required key not available
From 10.173.132.5 icmp_seq=2 Destination Host Unreachable
ping: sendmsg: Required key not available
From 10.173.132.5 icmp_seq=3 Destination Host Unreachable
ping: sendmsg: Required key not available
From 10.173.132.5 icmp_seq=4 Destination Host Unreachable
ping: sendmsg: Required key not available
^C
--- 10.173.132.3 ping statistics ---
4 packets transmitted, 0 received, +4 errors, 100% packet loss, time 3073ms
Set Access Control to Allow
root@uptimekuma-server:~# ping 10.173.132.3
PING 10.173.132.3 (10.173.132.3) 56(84) bytes of data.
^C
--- 10.173.132.3 ping statistics ---
10 packets transmitted, 0 received, 100% packet loss, time 9210ms
No Response
wg show output
interface: netmaker
public key: 2T6shEBz0ZezDjWF/PVesTVVMoIuxFEqxU8J6VyXBl8=
private key: (hidden)
listening port: 51822
peer: TCnithwEZnUQ8MF5FFW/QzhrOdyoy0D8khs1QYINqCg=
endpoint: 117.255.216.27:10402
allowed ips: 10.173.132.4/32
latest handshake: 18 seconds ago
transfer: 239.38 KiB received, 234.74 KiB sent
persistent keepalive: every 20 seconds
peer: ocI37po2P2K3DViMEnyJDIWF6wzXf+wF2DzVjxrG0GI=
endpoint: 14.192.19.27:10401
allowed ips: 10.134.66.4/32
latest handshake: 31 seconds ago
transfer: 241.06 KiB received, 235.93 KiB sent
persistent keepalive: every 20 seconds
peer: FhTT0AatGIcLz65tDeQGxomTocoT0a0hsyn4gjS1GC0=
endpoint: 14.192.19.27:10402
allowed ips: 10.134.66.2/32
latest handshake: 1 minute, 4 seconds ago
transfer: 231.91 KiB received, 242.04 KiB sent
persistent keepalive: every 20 seconds
peer: PnpWajKeZNMFCCoCQ0gRzRDWp8yCZSoQdYLGX/1mJUk=
endpoint: 14.192.19.27:10400
allowed ips: 10.134.66.3/32
latest handshake: 1 minute, 12 seconds ago
transfer: 233.47 KiB received, 243.20 KiB sent
persistent keepalive: every 20 seconds
peer: gDiXkluIkQKPyvdgYJOTGasrXsKiOpzYorwZWPKldk0=
endpoint: 117.255.216.27:10400
allowed ips: 10.173.132.2/32
latest handshake: 1 minute, 13 seconds ago
transfer: 238.16 KiB received, 245.72 KiB sent
persistent keepalive: every 20 seconds
peer: Fd4cQQwnxDdegN1Ri5xpUlKs8g0kmiNjQ5z3+9vMMUY=
endpoint: 103.177.224.177:51821
allowed ips: 10.173.132.1/32, 10.134.66.1/32
latest handshake: 1 minute, 40 seconds ago
transfer: 7.73 KiB received, 31.18 KiB sent
persistent keepalive: every 20 seconds
peer: wVTPL0tV+J2HMe6NChxE8qOFT0VW5+Wk5Fg0tsamIwY=
endpoint: 117.255.216.27:10400
allowed ips: (none)
transfer: 0 B received, 252.35 KiB sent
persistent keepalive: every 20 seconds
peer: ZQoZ/JCXqhkpOJCDY2lR8IPDbahkeVHUQ9NIjFKAI30=
endpoint: 117.255.216.27:10400
allowed ips: 10.173.132.3/32
transfer: 0 B received, 1.45 KiB sent
persistent keepalive: every 20 seconds
I also had a similar issue. But it seems like it's fixed in v0.22.0!
I did not appear anymore and netmaker is zooooming along. Closing this.
What happened?
Version
v0.20.2
What OS are you using?
Linux
Relevant log output
No response
Contributing guidelines