netbirdio / netbird

Connect your devices into a secure WireGuard®-based overlay network with SSO, MFA and granular access controls.
https://netbird.io
BSD 3-Clause "New" or "Revised" License
10.6k stars 474 forks source link

Unexpected peer connections in HA Kubernetes setup #2486

Open LeszekBlazewski opened 2 weeks ago

LeszekBlazewski commented 2 weeks ago

Describe the problem

Unexpected connections established between peers which are part of a single group that is assigned a network route (HA). Those relay connections are established repeatedly in a loop every 10 seconds or so. Based on the logs it looks like some of the peers are constantly connecting and disconnecting. The below provided logs happen randomly on any of the peers in the group and for any of the other pod IPs from the screenshot. I suspect that the above is the reason why from time to time external peers are observing short random disconnections of some of the peers. If I decrease the number of replicas to 1, I do not see this happening.

Sample logs from one of the routing peers (I have replaced the self hosted netbird server IP with X.X.X.X)

2024-08-26T15:06:25Z DEBG client/internal/peer/conn.go:603: discovered local candidate udp4 host 10.4.5.83:51820
2024-08-26T15:06:25Z DEBG client/internal/peer/conn.go:603: discovered local candidate udp4 srflx 18.212.36.166:1601 related 0.0.0.0:51820
2024-08-26T15:06:25Z DEBG util/net/listener_nonios.go:120: Listener resolved IP for X.X.X.X:3478: X.X.X.X
2024-08-26T15:06:25Z DEBG client/internal/peer/conn.go:603: discovered local candidate udp4 relay 100.29.142.57:58755 related 0.0.0.0:51682
2024-08-26T15:06:25Z DEBG client/internal/peer/conn.go:755: OnRemoteCandidate from peer SFegSDX/cfxLYDYJm+uwqh4ufPvib/q+iD9sYx+GlzY= -> udp4 srflx 91.204.194.21:43126 related 0.0.0.0:51820
2024-08-26T15:06:25Z DEBG client/internal/peer/conn.go:755: OnRemoteCandidate from peer SFegSDX/cfxLYDYJm+uwqh4ufPvib/q+iD9sYx+GlzY= -> udp4 srflx 91.204.194.21:51820 related 0.0.0.0:51820
2024-08-26T15:06:25Z DEBG iface/bind/udp_mux.go:346: ICE: registered 91.204.194.21:43126 for MFfKBCvDNEuXcBfP
2024-08-26T15:06:25Z DEBG iface/bind/udp_mux.go:346: ICE: registered 91.204.194.21:43126 for MFfKBCvDNEuXcBfPturn:netbird.ops.videri.com:3478?transport=udp
2024-08-26T15:06:25Z DEBG iface/bind/udp_mux.go:346: ICE: registered 91.204.194.21:51820 for MFfKBCvDNEuXcBfP
2024-08-26T15:06:25Z DEBG iface/bind/udp_mux.go:346: ICE: registered 91.204.194.21:51820 for MFfKBCvDNEuXcBfPturn:netbird.ops.videri.com:3478?transport=udp
2024-08-26T15:06:25Z DEBG client/internal/peer/conn.go:755: OnRemoteCandidate from peer SFegSDX/cfxLYDYJm+uwqh4ufPvib/q+iD9sYx+GlzY= -> udp4 relay 100.29.142.57:61266 related 0.0.0.0:49649
2024-08-26T15:06:25Z DEBG iface/bind/udp_mux.go:346: ICE: registered 100.29.142.57:61266 for MFfKBCvDNEuXcBfP
2024-08-26T15:06:25Z DEBG iface/bind/udp_mux.go:346: ICE: registered 100.29.142.57:61266 for MFfKBCvDNEuXcBfPturn:netbird.ops.videri.com:3478?transport=udp
2024-08-26T15:06:25Z DEBG client/internal/peer/conn.go:755: OnRemoteCandidate from peer FcP7W8qw1FcasoxfvCEKznGeGvklAXW8g2WgCjwwEnw= -> udp4 host 192.168.0.234:51820
2024-08-26T15:06:25Z DEBG iface/bind/udp_mux.go:346: ICE: registered 192.168.0.234:51820 for SNrQOsocfcFBtkzA
2024-08-26T15:06:25Z DEBG iface/bind/udp_mux.go:346: ICE: registered 192.168.0.234:51820 for SNrQOsocfcFBtkzAturn:netbird.ops.videri.com:3478?transport=udp
2024-08-26T15:06:25Z DEBG client/internal/peer/conn.go:755: OnRemoteCandidate from peer FcP7W8qw1FcasoxfvCEKznGeGvklAXW8g2WgCjwwEnw= -> udp4 srflx 89.64.83.188:47306 related 0.0.0.0:51820
2024-08-26T15:06:25Z DEBG client/internal/peer/conn.go:755: OnRemoteCandidate from peer FcP7W8qw1FcasoxfvCEKznGeGvklAXW8g2WgCjwwEnw= -> udp4 srflx 89.64.83.188:51820 related 0.0.0.0:51820
2024-08-26T15:06:25Z DEBG iface/bind/udp_mux.go:346: ICE: registered 89.64.83.188:51820 for SNrQOsocfcFBtkzA
2024-08-26T15:06:25Z DEBG iface/bind/udp_mux.go:346: ICE: registered 89.64.83.188:51820 for SNrQOsocfcFBtkzAturn:netbird.ops.videri.com:3478?transport=udp
2024-08-26T15:06:25Z DEBG iface/bind/udp_mux.go:346: ICE: registered 89.64.83.188:47306 for SNrQOsocfcFBtkzA
2024-08-26T15:06:25Z DEBG iface/bind/udp_mux.go:346: ICE: registered 89.64.83.188:47306 for SNrQOsocfcFBtkzAturn:netbird.ops.videri.com:3478?transport=udp
2024-08-26T15:06:25Z DEBG iface/bind/udp_mux.go:346: ICE: registered X.X.X.X:61266 for MFfKBCvDNEuXcBfP
2024-08-26T15:06:25Z DEBG iface/bind/udp_mux.go:346: ICE: registered X.X.X.X:61266 for MFfKBCvDNEuXcBfPturn:netbird.ops.videri.com:3478?transport=udp
2024-08-26T15:06:26Z DEBG client/internal/peer/conn.go:755: OnRemoteCandidate from peer FcP7W8qw1FcasoxfvCEKznGeGvklAXW8g2WgCjwwEnw= -> udp4 relay 100.29.142.57:56453 related 0.0.0.0:61121
2024-08-26T15:06:26Z DEBG iface/bind/udp_mux.go:346: ICE: registered X.X.X.X:56453 for SNrQOsocfcFBtkzA
2024-08-26T15:06:26Z DEBG client/internal/peer/conn.go:639: peer FcP7W8qw1FcasoxfvCEKznGeGvklAXW8g2WgCjwwEnw= ICE ConnectionState has changed to Connected
2024-08-26T15:06:26Z DEBG client/internal/peer/conn.go:412: Conn resolved IP for 89.64.83.188:47306: 89.64.83.188
2024-08-26T15:06:26Z DEBG iface/iface.go:92: updating interface wt0 peer FcP7W8qw1FcasoxfvCEKznGeGvklAXW8g2WgCjwwEnw=, endpoint 89.64.83.188:47306
2024-08-26T15:06:26Z DEBG client/internal/peer/conn.go:633: selected candidate pair [local <-> remote] -> [udp4 srflx 18.212.36.166:1601 related 0.0.0.0:51820 <-> udp4 srflx 89.64.83.188:47306 related 0.0.0.0:51820], peer FcP7W8qw1FcasoxfvCEKznGeGvklAXW8g2WgCjwwEnw=
2024-08-26T15:06:26Z INFO client/internal/peer/conn.go:362: connected to peer FcP7W8qw1FcasoxfvCEKznGeGvklAXW8g2WgCjwwEnw=, endpoint address: 89.64.83.188:47306
2024-08-26T15:06:26Z DEBG client/internal/peer/conn.go:412: Conn resolved IP for 10.4.14.64:41964: 10.4.14.64
2024-08-26T15:06:26Z DEBG iface/iface.go:92: updating interface wt0 peer SFegSDX/cfxLYDYJm+uwqh4ufPvib/q+iD9sYx+GlzY=, endpoint 10.4.14.64:41964
2024-08-26T15:06:26Z INFO client/internal/peer/conn.go:362: connected to peer SFegSDX/cfxLYDYJm+uwqh4ufPvib/q+iD9sYx+GlzY=, endpoint address: 10.4.14.64:41964
2024-08-26T15:06:26Z DEBG client/internal/peer/conn.go:639: peer SFegSDX/cfxLYDYJm+uwqh4ufPvib/q+iD9sYx+GlzY= ICE ConnectionState has changed to Connected
2024-08-26T15:06:26Z DEBG client/internal/peer/conn.go:633: selected candidate pair [local <-> remote] -> [udp4 host 10.4.5.83:51820 <-> udp4 prflx 10.4.14.64:41964 related :0], peer SFegSDX/cfxLYDYJm+uwqh4ufPvib/q+iD9sYx+GlzY=
2024-08-26T15:06:28Z DEBG client/internal/peer/conn.go:639: peer xtk9ScZHXEWSM3USaJQiikaU5Fo6lOAUnQe+S7sI5Ho= ICE ConnectionState has changed to Disconnected
2024-08-26T15:06:28Z DEBG client/internal/peer/conn.go:496: trying to cleanup xtk9ScZHXEWSM3USaJQiikaU5Fo6lOAUnQe+S7sI5Ho=
2024-08-26T15:06:28Z DEBG iface/iface.go:101: Removing peer xtk9ScZHXEWSM3USaJQiikaU5Fo6lOAUnQe+S7sI5Ho= from interface wt0 
2024-08-26T15:06:28Z DEBG client/internal/peer/conn.go:639: peer xtk9ScZHXEWSM3USaJQiikaU5Fo6lOAUnQe+S7sI5Ho= ICE ConnectionState has changed to Closed
2024-08-26T15:06:28Z DEBG client/internal/peer/conn.go:554: cleaned up connection to peer xtk9ScZHXEWSM3USaJQiikaU5Fo6lOAUnQe+S7sI5Ho=
2024-08-26T15:06:28Z DEBG client/internal/engine.go:990: connection to peer xtk9ScZHXEWSM3USaJQiikaU5Fo6lOAUnQe+S7sI5Ho= failed: disconnected from peer xtk9ScZHXEWSM3USaJQiikaU5Fo6lOAUnQe+S7sI5Ho=
2024-08-26T15:06:28Z DEBG client/internal/peer/conn.go:256: trying to connect to peer xtk9ScZHXEWSM3USaJQiikaU5Fo6lOAUnQe+S7sI5Ho=
2024-08-26T15:06:28Z DEBG client/internal/peer/conn.go:288: connection offer sent to peer xtk9ScZHXEWSM3USaJQiikaU5Fo6lOAUnQe+S7sI5Ho=, waiting for the confirmation
2024-08-26T15:06:32Z DEBG client/internal/peer/conn.go:639: peer SFegSDX/cfxLYDYJm+uwqh4ufPvib/q+iD9sYx+GlzY= ICE ConnectionState has changed to Disconnected
2024-08-26T15:06:32Z DEBG client/internal/peer/conn.go:496: trying to cleanup SFegSDX/cfxLYDYJm+uwqh4ufPvib/q+iD9sYx+GlzY=
2024-08-26T15:06:32Z DEBG iface/iface.go:101: Removing peer SFegSDX/cfxLYDYJm+uwqh4ufPvib/q+iD9sYx+GlzY= from interface wt0 
2024-08-26T15:06:32Z DEBG client/internal/peer/conn.go:554: cleaned up connection to peer SFegSDX/cfxLYDYJm+uwqh4ufPvib/q+iD9sYx+GlzY=
2024-08-26T15:06:32Z DEBG client/internal/engine.go:990: connection to peer SFegSDX/cfxLYDYJm+uwqh4ufPvib/q+iD9sYx+GlzY= failed: disconnected from peer SFegSDX/cfxLYDYJm+uwqh4ufPvib/q+iD9sYx+GlzY=
2024-08-26T15:06:32Z DEBG client/internal/peer/conn.go:639: peer SFegSDX/cfxLYDYJm+uwqh4ufPvib/q+iD9sYx+GlzY= ICE ConnectionState has changed to Closed
2024-08-26T15:06:33Z DEBG client/internal/peer/conn.go:256: trying to connect to peer SFegSDX/cfxLYDYJm+uwqh4ufPvib/q+iD9sYx+GlzY=
2024-08-26T15:06:33Z DEBG client/internal/peer/conn.go:288: connection offer sent to peer SFegSDX/cfxLYDYJm+uwqh4ufPvib/q+iD9sYx+GlzY=, waiting for the confirmation
2024-08-26T15:06:33Z DEBG client/internal/peer/conn.go:726: OnRemoteOffer from peer SFegSDX/cfxLYDYJm+uwqh4ufPvib/q+iD9sYx+GlzY= on status Disconnected
2024-08-26T15:06:33Z DEBG client/internal/peer/conn.go:654: sending answer to SFegSDX/cfxLYDYJm+uwqh4ufPvib/q+iD9sYx+GlzY=
2024-08-26T15:06:33Z DEBG client/internal/peer/conn.go:309: received connection confirmation from peer SFegSDX/cfxLYDYJm+uwqh4ufPvib/q+iD9sYx+GlzY= running version 0.28.7 and with remote WireGuard listen port 51820
2024-08-26T15:06:33Z DEBG client/internal/peer/conn.go:639: peer SFegSDX/cfxLYDYJm+uwqh4ufPvib/q+iD9sYx+GlzY= ICE ConnectionState has changed to Checking
2024-08-26T15:06:33Z DEBG client/internal/peer/conn.go:603: discovered local candidate udp4 host 10.4.5.83:51820
... and the cycle repeats

Additional context

I am trying to run a set of peers on a single Kubernetes cluster which is part of a cloud network (in my case AWS VPC) which allows routing connections to private resources. The setup I am trying to achieve consists of the following:

In the ACLs I have not granted an ACL which would allow the above 3 peers to connect with each other. I have only allowed other (external peers) to connect with the group attached to all of those 3 peers.

To Reproduce

Deploy the following manifest in kubernetes and inspect the peer logs (You might need to tweak topologySpreadConstraints if your cluster does not support nodes in 3 different AZs):

apiVersion: v1
kind: Namespace
metadata:
  labels:
    app: netbird-peer
  name: netbird
---
apiVersion: v1
data:
  NB_ADMIN_URL: https://MY.SELF.HOSTED.NETBIRD:443
  NB_HOSTNAME: vpc-use1-dev-k8s-router
  NB_LOG_FILE: console
  NB_LOG_LEVEL: debug
  NB_MANAGEMENT_URL: https://MY.SELF.HOSTED.NETBIRD:33073
kind: ConfigMap
metadata:
  labels:
    app: netbird-peer
  name: netbird-peer-9k8kg66842
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: netbird-peer
  name: netbird-peer
spec:
  replicas: 3
  revisionHistoryLimit: 2
  selector:
    matchLabels:
      app: netbird-peer
  template:
    metadata:
      labels:
        app: netbird-peer
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchLabels:
                  app: netbird-peer
              topologyKey: kubernetes.io/hostname
      containers:
        - envFrom:
            - secretRef:
                name: netbird-peer
            - configMapRef:
                name: netbird-peer-9k8kg66842
          image: netbirdio/netbird:0.28.8
          imagePullPolicy: IfNotPresent
          name: netbird-peer
          ports:
            - containerPort: 51820
              hostPort: 51820
              name: wireguard
              protocol: UDP
          resources:
            limits:
              memory: 60M
            requests:
              cpu: 6m
              memory: 60M
          securityContext:
            privileged: true
      topologySpreadConstraints:
        - labelSelector:
            matchLabels:
              app: netbird-peer
          maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
---
apiVersion: v1
data:
  NB_SETUP_KEY: 64_ENCODED_SETUP_KEY
kind: Secret
metadata:
  labels:
    app: netbird-peer
  name: netbird-peer
type: Opaque

The reason why I am running as privileged: true and with hostPort: 51820 is that only those 2 configuration options allowed me to achieve a setup where all of the peers would be visible as P2P for other external netbird peers.

Expected behaviour

No error logs of periodic relay reconnections between the peers which are part of the same Kubernetes cluster. I am not sure why those connections are established.

Are you using NetBird Cloud?

I am using self hosted netbird.

NetBird version

0.28.8 on all mentioned kubernetes peers and the following versions on the self hosted netbird:

NETBIRD_DASHBOARD_TAG="v2.5.0"
NETBIRD_SIGNAL_TAG="0.28.8"
NETBIRD_MANAGEMENT_TAG="0.28.8"
COTURN_TAG="4"

NetBird status -dA output:

I am running those peers as pods, so in foreground mode and I am not sure how could I use netbird status -dA to get the output in this case.

Screenshots

Pods (netbird clients) running in Kubernetes:

image

Network route configuration:

image

Please let me know if any more info is needed to troubleshoot the above issue.

Initially I thought that my case is similar to https://github.com/netbirdio/netbird/issues/2150 but the issue I am observing now seems different.