netbirdio / netbird

Connect your devices into a secure WireGuard®-based overlay network with SSO, MFA and granular access controls.
https://netbird.io
BSD 3-Clause "New" or "Revised" License
11.25k stars 517 forks source link

Docker client is always relayed in network_mode: host #2604

Open Spiritreader opened 2 months ago

Spiritreader commented 2 months ago

Describe the problem

I have the scenario that I want netbird to be available on the host, (accessing services that run there). I cannot install it directly, because I am using unraid, and this OS generally doesn't allow irunning applications without docker. So, in order to connect to the host, I have to spin up the netbird client in host network mode.

Scenario 1 - Network Mode Host for Netbird Client Container

Example compose file

services:
  netbird:
    image: netbirdio/netbird:latest
    container_name: pt-netbird-client
    restart: unless-stopped
    network_mode: "host"
    privileged: true
    cap_add:
      - NET_ADMIN
      - SYS_ADMIN
    environment:
      - NB_MANAGEMENT_URL=https://my.selfhosted.instance
      - NB_SETUP_KEY=asetupkey
    volumes:
      - /mnt/user/appdata/netbird-client:/etc/netbird
      - /etc/resolv.conf:/etc/resolv.conf

From my desktop, I get:

 server.netbird.selfhosted:
  NetBird IP: 100.64.225.216
  Public key: PublicKey
  Status: Connected
  -- detail --
  Connection type: Relayed
  ICE candidate (Local/Remote): -/-
  ICE candidate endpoints (Local/Remote): -/-
  Relay server address: rels://my.selfhosted.relay
  Last connection update: 23 seconds ago
  Last WireGuard handshake: 11 seconds ago
  Transfer status (received/sent) 45.0 KiB/26.8 KiB
  Quantum resistance: false
  Routes: 192.168.1.0/24
  Latency: 22.3312ms

But I can reach services running on the server itself. Aka curl server.netbird.selvhosted will return the page that's running on this peer.

Scenario 2 - Network Mode Bridge for Netbird Client Container

Example compose file

services:
  netbird:
    image: netbirdio/netbird:latest
    container_name: pt-netbird-client
    restart: unless-stopped
    privileged: true
    cap_add:
      - NET_ADMIN
      - SYS_ADMIN
    environment:
      - NB_MANAGEMENT_URL=https://my.selfhosted.instance
      - NB_SETUP_KEY=asetupkey
    volumes:
      - /mnt/user/appdata/netbird-client:/etc/netbird
      - /etc/resolv.conf:/etc/resolv.conf

This will create a new docker network, and immediately produce a P2P connection.

 server.netbird.selfhosted:
  NetBird IP: 100.64.225.216
  Public key: PublicKey
  Status: Connected
  -- detail --
  Connection type: P2P
  ICE candidate (Local/Remote): host/prflx
  ICE candidate endpoints (Local/Remote): 192.168.75.1:56565/REMOTEADDR
  Relay server address: rels://my.selfhosted.relay
  Last connection update: 11 seconds ago
  Last WireGuard handshake: 13 seconds ago
  Transfer status (received/sent) 20.7 KiB/14.3 KiB
  Quantum resistance: false
  Routes: 192.168.1.0/24
  Latency: 21.2773ms

However, because it now runs in an isolated docker network, I can't access services running on server.netbird.selfhosted anymore, and instead only have access to the netbird container.

To Reproduce

Steps to reproduce the behavior:

  1. Set up the netbird image in host mode
  2. Observe "relayed"
  3. Set up the netbird image in bridge mode
  4. Observe "p2p"

Expected behavior

P2P is possible in both instances

Are you using NetBird Cloud?

self-hosted.

NetBird version

0.29.2

Do you face any (non-mobile) client issues?

Please provide the file created by netbird debug for 1m -AS.

I cannot do that, because interacting with netbird is broken in docker containers as it runs as foreground application.

Error: failed to connect to daemon error: context deadline exceeded
If the daemon is not running please run: 
netbird service install 
netbird service start
Spiritreader commented 1 month ago

I have managed to download the netbird binaries on the unraid host and ran them as foreground, so I can get logs from the server. The bad news is that the issue still exists.

The logs give some insight on what's happenig when ICE is being negotiated:

2024-10-14T01:55:52+02:00 DEBG [peer: 1Xxb2B7huWfPEERDTa9bPVYz2k=] client/internal/peer/handshaker.go:91: received connection confirmation, running version 0.29.4 and with remote WireGuard listen port 51820
2024-10-14T01:55:52+02:00 DEBG [peer: 1Xxb2B7huWfPEERDTa9bPVYz2k=] client/internal/peer/handshaker.go:79: wait for remote offer confirmation
2024-10-14T01:55:52+02:00 DEBG [peer: 1Xxb2B7huWfPEERDTa9bPVYz2k=] client/internal/peer/worker_ice.go:116: OnNewOffer for ICE
2024-10-14T01:55:52+02:00 DEBG [peer: 1Xxb2B7huWfPEERDTa9bPVYz2k=] client/internal/peer/worker_ice.go:134: recreate ICE agent
2024-10-14T01:55:52+02:00 DEBG relay/client/manager.go:128: open peer connection via permanent server: 1Xxb2B7huWfPEERDTa9bPVYz2k=
2024-10-14T01:55:52+02:00 DEBG [peer: 1Xxb2B7huWfPEERDTa9bPVYz2k=] client/internal/peer/worker_relay.go:77: handled offer by reusing existing relay connection
2024-10-14T01:55:52+02:00 DEBG [peer: 1Xxb2B7huWfPEERDTa9bPVYz2k=] client/internal/peer/worker_ice.go:145: gather candidates
2024-10-14T01:55:52+02:00 DEBG [peer: 1Xxb2B7huWfPEERDTa9bPVYz2k=] client/internal/peer/worker_ice.go:155: turn agent dial
2024-10-14T01:55:52+02:00 DEBG [peer: 1Xxb2B7huWfPEERDTa9bPVYz2k=] client/internal/peer/worker_ice.go:254: ICE ConnectionState has changed to Checking
2024-10-14T01:55:52+02:00 DEBG [peer: 1Xxb2B7huWfPEERDTa9bPVYz2k=] client/internal/peer/worker_ice.go:318: discovered local candidate udp4 host 10.11.12.5:59595
2024-10-14T01:55:52+02:00 DEBG [peer: 1Xxb2B7huWfPEERDTa9bPVYz2k=] client/internal/peer/worker_ice.go:318: discovered local candidate udp4 host 192.168.1.133:59595
2024-10-14T01:55:52+02:00 DEBG [peer: 1Xxb2B7huWfPEERDTa9bPVYz2k=] client/internal/peer/worker_ice.go:318: discovered local candidate udp4 host 192.168.122.1:59595
2024-10-14T01:55:52+02:00 DEBG [peer: 1Xxb2B7huWfPEERDTa9bPVYz2k=] client/internal/peer/worker_ice.go:198: OnRemoteCandidate from peer 1Xxb2B7huWfPEERDTa9bPVYz2k= -> udp4 host 192.168.0.171:51820
2024-10-14T01:55:52+02:00 DEBG client/iface/bind/udp_mux.go:346: ICE: registered 192.168.0.171:51820 for KBbmtCjGDBSLwAJI
2024-10-14T01:55:52+02:00 DEBG [peer: 1Xxb2B7huWfPEERDTa9bPVYz2k=] client/internal/peer/worker_ice.go:198: OnRemoteCandidate from peer 1Xxb2B7huWfPEERDTa9bPVYz2k= -> udp4 host 127.0.0.1:51820
2024-10-14T01:55:52+02:00 DEBG client/iface/bind/udp_mux.go:346: ICE: registered 127.0.0.1:51820 for KBbmtCjGDBSLwAJI
2024-10-14T01:55:52+02:00 DEBG [peer: 1Xxb2B7huWfPEERDTa9bPVYz2k=] client/internal/peer/worker_ice.go:198: OnRemoteCandidate from peer 1Xxb2B7huWfPEERDTa9bPVYz2k= -> udp4 srflx 1Xxb2B7-PUBLIC-ip:6834 related 0.0.0.0:51820
2024-10-14T01:55:52+02:00 DEBG [peer: 1Xxb2B7huWfPEERDTa9bPVYz2k=] client/internal/peer/worker_ice.go:198: OnRemoteCandidate from peer 1Xxb2B7huWfPEERDTa9bPVYz2k= -> udp4 srflx 1Xxb2B7-PUBLIC-ip:51820 related 0.0.0.0:51820
2024-10-14T01:55:52+02:00 DEBG client/iface/bind/udp_mux.go:346: ICE: registered 1Xxb2B7-PUBLIC-ip:6834 for KBbmtCjGDBSLwAJI
2024-10-14T01:55:52+02:00 DEBG client/iface/bind/udp_mux.go:346: ICE: registered 1Xxb2B7-PUBLIC-ip:51820 for KBbmtCjGDBSLwAJI
2024-10-14T01:56:04+02:00 DEBG [peer: 1Xxb2B7huWfPEERDTa9bPVYz2k=] client/internal/peer/worker_ice.go:254: ICE ConnectionState has changed to Failed
2024-10-14T01:56:04+02:00 DEBG [peer: 1Xxb2B7huWfPEERDTa9bPVYz2k=] client/internal/peer/worker_ice.go:158: failed to dial the remote peer: connecting canceled by caller
2024-10-14T01:56:04+02:00 DEBG [peer: 1Xxb2B7huWfPEERDTa9bPVYz2k=] client/internal/peer/worker_ice.go:254: ICE ConnectionState has changed to Closed
2024-10-14T01:56:28+02:00 DEBG [peer: 1Xxb2B7huWfPEERDTa9bPVYz2k=] client/internal/peer/conn.go:304: OnRemoteOffer, on status ICE: Disconnected, status Relay: Connected

It seems like this is failing:

failed to dial the remote peer: connecting canceled by caller

The question is why this happens in network_mode: host AND on-metal on the server, but not on bridged networks or on VMs running on the same host. The connection it's trying to dial is definitely reachable, as this happens even for machines that aren't firewalled at all on port 51820