netbirdio / netbird

Connect your devices into a secure WireGuard®-based overlay network with SSO, MFA and granular access controls.
https://netbird.io
BSD 3-Clause "New" or "Revised" License
11.06k stars 509 forks source link

New Relay public thread - Q&A and Issues discussions #2566

Open mlsmaycon opened 1 month ago

mlsmaycon commented 1 month ago

Hello folks, this issue is open to any questions or problems regarding the new relay implementation.

mlsmaycon commented 1 month ago

Status information to confirm relay usage:

Peers detail:
 relay-test-ip-172-20-1-178-rly.netbird.selfhosted:
  NetBird IP: 100.89.101.6
  Public key: CdRpcUnzq2LM9v97VnU7JiiqE0Y4wXp379mXju0efjk=
  Status: Connected
  -- detail --
  Connection type: Relayed
  ICE candidate (Local/Remote): -/-
  ICE candidate endpoints (Local/Remote): -/-
  Relay server address: rels://relay-eu1.stage.netbird.io <--------------- indicates the relay used to connect to the remote peer
  Last connection update: 2 seconds ago
  Last WireGuard handshake: 3 seconds ago
  Transfer status (received/sent) 92 B/180 B
  Quantum resistance: false
  Routes: -
  Latency: 0s

 relay-test-ip-172-20-14-148.netbird.selfhosted:
  NetBird IP: 100.89.212.227
  Public key: bhSrOMLvN+5cMnjWyL4gB+o9En2a1AvAGWNB5N+gEGw=
  Status: Connected
  -- detail --
  Connection type: P2P
  ICE candidate (Local/Remote): host/srflx
  ICE candidate endpoints (Local/Remote): 192.168.178.38:51820/1.2.3.4:51820
  Relay server address: rels://relay-eu2.stage.netbird.io. <--------------- indicates the relay used to connect to the remote peer ( there is a bug which this needs to be cleaned after P2P connection)
  Last connection update: 2 seconds ago
  Last WireGuard handshake: 3 seconds ago
  Transfer status (received/sent) 92 B/180 B
  Quantum resistance: false
  Routes: 34.160.111.145/32
  Latency: 28.5755ms

OS: darwin/arm64
Daemon version: 0.29.0
CLI version: 0.29.0
Management: Connected to [https://test.stage.netbird.io:443](https://test.stage.netbird.io/)
Signal: Connected to [https://signal.stage.netbird.io:443](https://signal.stage.netbird.io/)
Relays:
  [stun:test.stage.netbird.io:3478] is Available
  [turn:test.stage.netbird.io:3478?transport=udp] is Available
  [rels://relay-eu1.stage.netbird.io] is Available.    <--------------- indicates the relay used by your local client (the home relay)
Nameservers:
  [8.8.8.8:53, 8.8.4.4:53] for [.] is Available
FQDN: maycons-macbook-pro-2-1.netbird.selfhosted
NetBird IP: 100.89.107.107/16
Interface type: Userspace
Quantum resistance: false
Routes: -
Peers count: 2/2 Connected
allroundtechie commented 1 month ago

Hi,

I have some questions about the new relay which are not clear to me.

  1. In the release notes you wrote "We are moving away from the TURN relay (coturn) to our own relay implementation based on WebSocket". If I take that literally this means that "only" the TURN part of coturn gets replaced but not the STUN part. Is this correct and the release only the first step to replace coturn completely or is the STUN part also already replaced with the new relay?
  2. In the example mentioned above which indicates the relay is used it is I guess active in a secured version but in the release notes only this part is mentioned: "Addresses": ["rel://:"] Can you enable TLS in the new relay and if yes how? Or is this something for a future release?
  3. I am using Traefik as a reverse proxy and also have implemented Netbird like described in the documentation which works well. I am missing documentation around the new relay and a reverse proxy.

Thanks in advance and also many thanks for your awesome work in building this great software stack!

mlsmaycon commented 1 month ago

@landmass-deftly-reptile-budget:

  1. Stun is still going to be required for the P2P discovery. Also, for retro-compatibility, TURN is still required.
  2. The supported URLs are rel:// and rels://, where rels is used for TLS connections. Like signal and management, the relay have Let's Encrypt support, and you can use the environment variables below to enable it:
    NB_EXPOSED_ADDRESS=rels://relay.example.com:443  # update the port configuration to match it
    NB_LETSENCRYPT_DOMAINS=relay.example.com # should match the exposed address
    NB_LETSENCRYPT_DATA_DIR=/etc/letsencrypt # mount this directory for persistency
    NB_LETSENCRYPT_EMAIL=admin@relay.example.com
    #NB_LETSENCRYPT_AWS_ROUTE53=true # in case you want to use route 53 for issuing the certificate

    It also supports certificate files with:

    NB_TLS_CERT_FILE=/etc/certificates/cert.crt
    NB_TLS_KEy_FILE=/etc/certificates/cert.key

Once this is done, add the exposed address to the management.json file and restart the file.

  1. Relay should work fine behind traefik. We are missing the configuration, but the traffic to the service can be routed with either a domain or with the /relay path prefix.
ismail0234 commented 1 month ago

Hello, I have 2 questions. I am undecided whether to upgrade or not.

  1. I don't fully understand the new Relay Feature. How will it benefit us?
  2. What was the reason to switch to our own relay application? Was there something that the existing system did not meet?
bryanjuho commented 1 month ago

Is it okay to update to 0.29.0 without actually running the new relay image and changing management.json?

rudradevpal commented 1 month ago

For new relay to work is there any new openwrt package released?

Marcus1Pierce commented 1 month ago

Is it okay to use same domain for management, signal, coturn and relay?

Example: If i use domain netbird.domain.com and i want to use this domain for all services but with different port is that okay?

Zaunei commented 1 month ago
  1. Also, for retro-compatibility, TURN is still required.

If I don't care about old clients, I can ignore TURN completely, right?

Otherwise, this sounds very promising, especially with Kubernetes, the port ranges of TURN have always made the setup a bit more complex. I will definitely give it a try and report back.

STUN will continue to be used in the future?

WolfgangDpunkt commented 1 month ago

Replace PORT and DOMAIN according to your deployment.

I have used the automatic setup script, so I am probably using the default values for ports, so what do I need to specify here for PORT in the compose file?

ndziuba commented 1 month ago

Replace PORT and DOMAIN according to your deployment.

I have used the automatic setup script, so I am probably using the default values for ports, so what do I need to specify here for PORT in the compose file?

It can be found in the setup.env file. The default port is 33080

MDMeridio001 commented 1 month ago

Hello,

Is it possible to run the relay behind nginx acting as a proxy? I have tried by adding the following to my nginx configuration file, but it results in clients recieving a 400 error when trying to establish a connection to the relay. A direct connection without nginx in front works perfectly fine.

upstream relay-upstream {
    server 127.0.0.1:33080;
}

[...]

# Proxy Relay http endpoint
    location /relay/ {
        proxy_pass http://relay-upstream/relay;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "Upgrade";
        proxy_set_header Host $host;
    }
mvivaldi commented 1 month ago

Hello,

Is it possible to run the relay behind nginx acting as a proxy? I have tried by adding the following to my nginx configuration file, but it results in clients recieving a 400 error when trying to establish a connection to the relay. A direct connection without nginx in front works perfectly fine.

upstream relay-upstream {
    server 127.0.0.1:33080;
}

[...]

# Proxy Relay http endpoint
    location /relay/ {
        proxy_pass http://relay-upstream/relay;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "Upgrade";
        proxy_set_header Host $host;
    }

try add these:

      proxy_set_header Host            $http_host;
      proxy_set_header X-Forwarded-Proto $scheme;
      proxy_set_header X-Forwarded-For $remote_addr;
      proxy_set_header X-Forwarded-Host $http_host;
      proxy_cache_bypass $http_upgrade;

and delete the directive:

proxy_set_header Host $host;
mlsmaycon commented 1 month ago

Hello, I have 2 questions. I am undecided whether to upgrade or not.

  1. I don't fully understand the new Relay Feature. How will it benefit us?
  2. What was the reason to switch to our own relay application? Was there something that the existing system did not meet?

@ismail0234 some of the benefits of the new relay over Coturn:

The main idea is to have a more efficient relay system for NetBird. Turn/Coturn is a really good system for short-term connections. As a connection via VPN usually lasts many hours or days, we need a more efficient system that can easily be scaled.

mlsmaycon commented 1 month ago

Is it okay to update to 0.29.0 without actually running the new relay image and changing management.json?

Yes it is. You don't need to update or configure anything if you don't want. It should be fully compatible with older versions of the management.json file.

mlsmaycon commented 1 month ago

For new relay to work is there any new openwrt package released?

We will look into updating the openwrt version.

mlsmaycon commented 1 month ago

Is it okay to use same domain for management, signal, coturn and relay?

Example: If i use domain netbird.domain.com and i want to use this domain for all services but with different port is that okay?

Yes it is possible.

MDMeridio001 commented 1 month ago

Hello, Is it possible to run the relay behind nginx acting as a proxy? I have tried by adding the following to my nginx configuration file, but it results in clients recieving a 400 error when trying to establish a connection to the relay. A direct connection without nginx in front works perfectly fine.

upstream relay-upstream {
    server 127.0.0.1:33080;
}

[...]

# Proxy Relay http endpoint
    location /relay/ {
        proxy_pass http://relay-upstream/relay;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "Upgrade";
        proxy_set_header Host $host;
    }

try add these:

      proxy_set_header Host            $http_host;
      proxy_set_header X-Forwarded-Proto $scheme;
      proxy_set_header X-Forwarded-For $remote_addr;
      proxy_set_header X-Forwarded-Host $http_host;
      proxy_cache_bypass $http_upgrade;

and delete the directive:

proxy_set_header Host $host;

I added them but I am still getting the same error. I don't know if it is of any help but this is what I added to the docker-compose.yml file:

# Relay
  relay:
    image: netbirdio/relay:latest
    restart: unless-stopped
    environment:
    - NB_LOG_LEVEL=info
    - NB_LISTEN_ADDRESS=:33080
    - NB_EXPOSED_ADDRESS=netbird.mydomain.com:443
    - NB_AUTH_SECRET=<MYSECRET>
    ports:
      - 127.0.0.1:33080:33080
    logging:
      driver: "json-file"
      options:
        max-size: "500m"
        max-file: "2"

And this is what I added to management.json:

"Relay": {
        "Addresses": ["rel://netbird.mydomain.com:443/relay"],
        "CredentialsTTL": "24h",
        "Secret": "<MYSECRET>"
},
mlsmaycon commented 1 month ago

@MDMeridio001 it seems like you are using nginx for SSL termination too, in that case, try this:

    - NB_EXPOSED_ADDRESS=rels://netbird.mydomain.com:443

and

"Relay": {
        "Addresses": ["rels://netbird.mydomain.com:443"],
        "CredentialsTTL": "24h",
        "Secret": "<MYSECRET>"
},
MDMeridio001 commented 1 month ago

@MDMeridio001 it seems like you are using nginx for SSL termination too, in that case, try this:

    - NB_EXPOSED_ADDRESS=rels://netbird.mydomain.com:443

and

"Relay": {
        "Addresses": ["rels://netbird.mydomain.com:443"],
        "CredentialsTTL": "24h",
        "Secret": "<MYSECRET>"
},

I completely forgot I needed to add "rels://", thank you so much, it's working fine now.

rgdev commented 1 month ago

Assuming a brand new deployment and all clients running 0.29+ where does coturn fit in the picture ? Can we just run coturn with --stun-only if retrocompability is no concern ?

mlsmaycon commented 1 month ago

@rgdev With a new deployment, it is very likely that Coturn will only be used with mobile clients until we update them.

Roeda commented 1 month ago

@rgdev With a new deployment, it is very likely that Coturn will only be used with mobile clients until we update them.

Excuse my confusion, but since you say that you still use STUN for peer discovery, and at the same time Coturn won’t be used when the mobile apps are updated. Does that mean that the STUN service is baked into the new Relay now (or the management service) ? (Would we be ultimately able to remove Coturn from docker compose and the management.json ?) Thank you very much for this new implementation it sounds cool and production friendly

wehagy commented 1 month ago

For new relay to work is there any new openwrt package released?

We will look into updating the openwrt version.

I am updating netbird package against openwrt snapshot for months, and I have no problem so far, in fact I have built the new version 0.29.0 and is working fine, and open a PR https://github.com/openwrt/packages/pull/24950, for now I just see the error 2024-09-10T15:25:09-03:00 INFO [peer: [ REDACTED ]=] client/internal/peer/worker_relay.go:59: Relay is not supported by remote peer, probably because I'm not selfhosting, and from release notes:

  • Cloud support for the new relay feature is coming soon*.

But I'm not backporting to openwrt 23.05, one of my targets is supported only on openwrt snapshot.

And to be honest someone open a issue https://github.com/openwrt/packages/issues/24569#issuecomment-2246451384 on openwrt repo to backport a new version, I offered my help to the person if he could test it, but I got no response.

ismail0234 commented 1 month ago

@mlsmaycon Thanks for the explanation. Do you think about optimization on the api side? The api slows down after 200 peers connected to the system. After 500 peers, it slows down a lot. Each request takes more than 1-2 seconds.

In the test measurements I made, these are the response times returned from the api according to the number of peers connected to the system.

20 Peers: 200-300 ms 100 Peers 300-600 ms 200 Peers: 500-1000 ms 500 Peers: 1500-3000 ms

mlsmaycon commented 1 month ago

Hey folks, we have a new release, 0.29.1. This release improves the relay with better authentication messages. To ensure your system is working properly, you should upgrade your relay and management servers before upgrading your clients.

allroundtechie commented 1 month ago

Works like a charm, thanks!

marcportabellaclotet-mt commented 1 month ago

Thanks for improving the relay functionality. I can't find the relay repo in netbirdio github. Will it be private or closed source?

allroundtechie commented 1 month ago

@marcportabellaclotet-mt

https://github.com/netbirdio/netbird/tree/main/relay

ptpu commented 1 month ago

A short example for traefik which is working fine for me:

docker-compose.yml

relay:
    image: "netbirdio/relay:latest"
    container_name: netbird-relay
    restart: unless-stopped
    env_file:
      - relay.env
      - common.env
    labels:
      traefik.enable: 'true'
      traefik.http.routers.netbird-relay.rule: 'Host("netbird.mydomain.com") && PathPrefix("/relay")'
      traefik.http.routers.netbird-relay.entrypoints: websecure
      traefik.http.routers.netbird-relay.service: netbird-relay-service
      traefik.http.services.netbird-relay-service.loadbalancer.server.port: 33080

relay.env

NB_LOG_LEVEL=info
NB_LISTEN_ADDRESS=:33080
NB_EXPOSED_ADDRESS=rels://netbird.mydomain.com:443/relay
NB_AUTH_SECRET=secret

management.json

"Relay": {
        "Addresses": ["rels://netbird.mydomain.com:443/relay"],
        "CredentialsTTL": "24h",
        "Secret": "secret"
    },
pugnobellum commented 1 month ago

Relay compose file

  relay:
    image: netbirdio/relay:latest
    container_name: netbird_relay
    restart: unless-stopped
    environment:
    - NB_LOG_LEVEL=info
    - NB_LISTEN_ADDRESS=:33080
    - NB_EXPOSED_ADDRESS=rels://netbird.mydomain.com:443
    - NB_AUTH_SECRET=secret
    ports:
      - 33080:33080
    networks:
      - proxynet
    logging:
      driver: "json-file"
      options:
        max-size: "500m"
        max-file: "2"

management.json

   "Relay": {
    "Addresses": ["rels://netbird.mydomain.com:443"],
    "CredentialsTTL": "24h",
    "Secret": "secret"
    },

netbird.subdomain.conf

server {
    listen 443 ssl;
    listen [::]:443 ssl;

    server_name netbird.mydomain.com;

    include /config/nginx/ssl3.conf;

    client_max_body_size 128M;
    client_header_timeout 1d;
    client_body_timeout 1d;

    location / {
        include /config/nginx/proxy.conf;
        include /config/nginx/resolver.conf;
        set $upstream_app netbird_dashboard;
        set $upstream_port 80;
        set $upstream_proto http;
        proxy_pass $upstream_proto://$upstream_app:$upstream_port;

    }

    location /api {
        include /config/nginx/proxy.conf;
        include /config/nginx/resolver.conf;
        set $upstream_app netbird_management;
        set $upstream_port 443;
        set $upstream_proto http;
        proxy_pass $upstream_proto://$upstream_app:$upstream_port;

    }

    location /signalexchange.SignalExchange/ {
        include /config/nginx/proxy.conf;
        include /config/nginx/resolver.conf;

        grpc_read_timeout 1d;
        grpc_send_timeout 1d;
        grpc_socket_keepalive on;

        set $upstream_app netbird_signal;
        set $upstream_port 80;
        set $upstream_proto grpc;
        grpc_pass $upstream_proto://$upstream_app:$upstream_port;

    }

    location /management.ManagementService/ {
        include /config/nginx/proxy.conf;
        include /config/nginx/resolver.conf;

        grpc_read_timeout 1d;
        grpc_send_timeout 1d;
        grpc_socket_keepalive on;

        set $upstream_app netbird_management;
        set $upstream_port 443;
        set $upstream_proto grpc;
        grpc_pass $upstream_proto://$upstream_app:$upstream_port;

    }

    location /relay/ {
        proxy_pass http://netbird_relay:33080/relay;

        # WebSocket support
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "Upgrade";

        # Forward headers
        proxy_set_header Host $http_host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # Timeout settings
        proxy_read_timeout 3600s;
        proxy_send_timeout 3600s;
        proxy_connect_timeout 60s;

        # Handle upstream errors
        proxy_next_upstream error timeout invalid_header http_500 http_502 http_503 http_504;
    }

}

I use SWAG reverse proxy which just bundles nginx and lets encrypt, my config files are above. I'm trying to add the new relay service. When I fire up my docker client/agent I get this error in the logs for it:

UPDATE: the current relay location I have now works.

EdouardVanbelle commented 1 month ago

Good for me using traefik as a proxy (my config is similar to @ptpu's one )

@mlsmaycon according to your sample, I guess I can spawn multiple relay instances for redundancy reasons:

meaning can I have this case:

    "Relay": {
            "Addresses": ["rels://relay-a.mydomain", "rels://relay-b.mydomain"],       

and you confirm that I cannot have this case: relay.mydomain resolving to multiple IPs ?

    "Relay": {
            "Addresses": ["rels://relay.mydomain"],       
mlsmaycon commented 1 month ago

Good for me using traefik as a proxy (my config is similar to @ptpu's one )

@mlsmaycon according to your sample, I guess I can spawn multiple relay instances for redundancy reasons:

meaning can I have this case:

    "Relay": {
            "Addresses": ["rels://relay-a.mydomain", "rels://relay-b.mydomain"],       

and you confirm that I cannot have this case: relay.mydomain resolving to multiple IPs ?

    "Relay": {
            "Addresses": ["rels://relay.mydomain"],       

Hey @EdouardVanbelle , you can have both, the first one means that the client will try both endpoints at the same time and use the one that responds first.

The second one implies a single node or a load balancer endpoint, in the first case, that should work fine, but for a LB, the nodes would need to expose their own addresses configured in the expose address configuration and it should either point directly to them or point to the LB with a Host routing rule to ensure that the traffic would go the correct node.

mlsmaycon commented 1 month ago

@rgdev With a new deployment, it is very likely that Coturn will only be used with mobile clients until we update them.

Excuse my confusion, but since you say that you still use STUN for peer discovery, and at the same time Coturn won’t be used when the mobile apps are updated. Does that mean that the STUN service is baked into the new Relay now (or the management service) ? (Would we be ultimately able to remove Coturn from docker compose and the management.json ?) Thank you very much for this new implementation it sounds cool and production friendly

@Roeda We are studying this option. We have two systems that can hold the stun role, signal and relay, which are both involved in the connection discovery, we will have a decision soon. But the idea of not having coturn in our self-hosted scripts and templates is something that will be applied after 2-3 major (v0.X.0) releases.

mlsmaycon commented 1 month ago

For new relay to work is there any new openwrt package released?

We will look into updating the openwrt version.

I am updating netbird package against openwrt snapshot for months, and I have no problem so far, in fact I have built the new version 0.29.0 and is working fine, and open a PR openwrt/packages#24950, for now I just see the error 2024-09-10T15:25:09-03:00 INFO [peer: [ REDACTED ]=] client/internal/peer/worker_relay.go:59: Relay is not supported by remote peer, probably because I'm not selfhosting, and from release notes:

  • Cloud support for the new relay feature is coming soon*.

But I'm not backporting to openwrt 23.05, one of my targets is supported only on openwrt snapshot.

And to be honest someone open a issue openwrt/packages#24569 (comment) on openwrt repo to backport a new version, I offered my help to the person if he could test it, but I got no response.

Thanks for the contribution @wehagy. I've asked the user again if they have some time to test it out.

mlsmaycon commented 1 month ago

@mlsmaycon Thanks for the explanation. Do you think about optimization on the api side? The api slows down after 200 peers connected to the system. After 500 peers, it slows down a lot. Each request takes more than 1-2 seconds.

In the test measurements I made, these are the response times returned from the api according to the number of peers connected to the system.

20 Peers: 200-300 ms 100 Peers 300-600 ms 200 Peers: 500-1000 ms 500 Peers: 1500-3000 ms

@ismail0234 We are working on some optimization around our database access. It would be helpful if you can share the exact setup you have, VM size and database you are using, and the API calls you are using to measure it.

Another thing that might affect a self-hosted installation is if you have users in the system that got removed from your IDP, that would cause lots of IDP requests to fetch new data to be cached.

ismail0234 commented 1 month ago

@mlsmaycon Thanks for the explanation. Do you think about optimization on the api side? The api slows down after 200 peers connected to the system. After 500 peers, it slows down a lot. Each request takes more than 1-2 seconds. In the test measurements I made, these are the response times returned from the api according to the number of peers connected to the system. 20 Peers: 200-300 ms 100 Peers 300-600 ms 200 Peers: 500-1000 ms 500 Peers: 1500-3000 ms

@ismail0234 We are working on some optimization around our database access. It would be helpful if you can share the exact setup you have, VM size and database you are using, and the API calls you are using to measure it.

Another thing that might affect a self-hosted installation is if you have users in the system that got removed from your IDP, that would cause lots of IDP requests to fetch new data to be cached.

@mlsmaycon

I'm using the standard Netbird installation, I haven't made any extra settings, so I guess sqllite is used by default. Users connect to the network via setup-key. Is there a way to find out the database size?

I also don't know what an IDP is. JWT Group sync and User group propagation feature are also disabled. I do not use these features. Right now, I'm kicking all unconnected peers off the network every hour to keep the api running fast, so my problems are completely fixed. As the number of peers in the network increases, api calls become very slow and some calls return 0 http response.

The api calls I use are as follows;

  1. “List all Peers”
  2. “List all Groups”

Since Netbird does not share any information on the client side to find the id of the user in api calls, I need to match the netbird ip. For this, it is necessary to go through all peers and match the correct ip.

mlsmaycon commented 1 month ago

@ismail0234 IDP is the identity provider (authentik, Zitadel, Google and others).

NetBird has an integration with the tool of your choice that mainly gets the user name and email and cache so that you can see them as human readable in the dashboard.

Something is really off about your performance. You can try disabling your IDP integration by updating the management.json and setting the following key to "none."

    "IdpManagerConfig": {
        "ManagerType": "zitadel",

to

    "IdpManagerConfig": {
        "ManagerType": "none",

Then you can restart your management server and test. You will see users as IDs during this test.

Also, can you create a ticket for this problem so we can continue there?

ismail0234 commented 1 month ago

@mlsmaycon Where do I create the ticket? Github or slack?

mlsmaycon commented 1 month ago

@ismail0234 Github, but feel free to reach out on Slack for a faster iteration.

ndziuba commented 1 month ago

For People that are using Caddy (based on the zitadel starter script)

Caddyfile:

  :80, netbird.example.com:443 {
          import security_headers
          reverse_proxy /relay* relay:80
          reverse_proxy /signalexchange.SignalExchange/* h2c://signal:10000
          reverse_proxy /api/* management:80
          reverse_proxy /management.ManagementService/* h2c://management:80
          reverse_proxy /* dashboard:80
  }

relay.env

  NB_LOG_LEVEL=info
  NB_LISTEN_ADDRESS=:80
  NB_EXPOSED_ADDRESS=rels://netbird.example.com:443
  NB_AUTH_SECRET="secret"

managment.json

  "Relay": {
          "Addresses": ["rels://netbird.example.com:443/relay"],
          "CredentialsTTL": "24h",
          "Secret": "secret"
  },

docker-compose.yml

  #Relay
  relay:
    image: netbirdio/relay:latest
    container_name: relay
    restart: unless-stopped
    env_file:
      - ./relay.env
    networks:
      - netbird #If you use a network
    logging:
      driver: "json-file"
      options:
        max-size: "500m"
        max-file: "2"
pellz0r commented 1 month ago

I followed the above settings for Caddy (as I've used the Zitadel starter script once upon a time), but when a node with the latest client tries to connect I get the following:

2024-09-12T19:35:10Z DEBG client/internal/connect.go:176: connecting to the Management service netbird.mydomain.se:443 2024-09-12T19:35:10Z DEBG util/net/dialer_nonios.go:52: Dialing tcp netbird.mydomain.se:443 2024-09-12T19:35:10Z DEBG client/internal/connect.go:184: connected to the Management service netbird.mydomain.se:443 2024-09-12T19:35:11Z DEBG util/net/dialer_nonios.go:52: Dialing tcp netbird.mydomain.se:443 2024-09-12T19:35:11Z DEBG signal/client/grpc.go:81: connected to Signal Service: netbird.mydomain.se:443 2024-09-12T19:35:11Z INFO client/internal/connect.go:251: connecting to the Relay service(s): rels://netbird.mydomain.se:443/relay 2024-09-12T19:35:11Z DEBG relay/client/manager.go:93: starting relay client manager with [rels://netbird.mydomain.se:443/relay] relay servers 2024-09-12T19:35:11Z INFO [client_id: sha-WnVmoeH7RuspQpTopd/8RnZhD9vXJTf3J9VTglSeyGk=] relay/client/client.go:141: connecting to relay server: rels://netbird.mydomain.se:443/relay 2024-09-12T19:35:11Z DEBG util/net/dialer_nonios.go:52: Dialing tcp netbird.mydomain.se:443 2024-09-12T19:35:11Z ERRO relay/client/dialer/ws/ws.go:36: failed to dial to Relay server 'wss://netbird.mydomain.se:443/relay': failed to WebSocket dial: expected handshake response status code 101 but got 404 2024-09-12T19:35:11Z WARN relay/client/manager.go:130: Connection attempt failed: failed to connect to rels://netbird.mydomain.se:443/relay: failed to WebSocket dial: expected handshake response status code 101 but got 404 2024-09-12T19:35:11Z ERRO client/internal/connect.go:253: failed to connect to any relay server: all attempts failed

Not really sure what I'm missing

EDIT: Ah, I messed up and didn't pull / restart all the containers. :)

1nerdyguy commented 1 month ago

To confirm:

With the new relay, I still need to have the Coturn instance for STUN at this time?

But I can deploy out multiple relay instances, update the config file accordingly, and it will use those?

tienlq2011 commented 1 month ago

Could there be a guide to deploying them on Kubernetes? Thank you!

1nerdyguy commented 1 month ago

Could there be a guide to deploying them on Kubernetes? Thank you!

To my understanding, it looks like it's just it's own container. So you'd just spin them up, map the port, and then update the management.json file and rebuild the containers

rgdev commented 1 month ago

Relay on k8s (behind a ingress-nginx reverse since it's websockets) :

Ingress

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt
    nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
  name: netbird-relay-ingress
  namespace: netbird
spec:
  ingressClassName: nginx
  rules:
  - host: netbird.company.com
    http:
      paths:
      - backend:
          service:
            name: netbird-relay
            port:
              number: 80
        path: /relay
        pathType: Prefix
  tls:
  - hosts:
    - netbird.company.com
    secretName: netbird-tls

Service

apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/name: netbird-relay
  name: netbird-relay
  namespace: netbird
spec:
  ports:
  - name: relay
    port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app.kubernetes.io/name: netbird-relay

Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/name: netbird-relay
  name: netbird-relay
  namespace: netbird
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: netbird-relay
  template:
    metadata:
      labels:
        app.kubernetes.io/component: relay
        app.kubernetes.io/instance: netbird-relay
        app.kubernetes.io/name: netbird-relay
        app.kubernetes.io/part-of: netbird
    spec:
      containers:
      - env:
        - name: NB_LOG_LEVEL
          value: info
        - name: NB_LISTEN_ADDRESS
          value: :80
        - name: NB_AUTH_SECRET
          valueFrom:
            secretKeyRef:
              key: auth_secret
              name: netbird-relay-authkey
        - name: NB_EXPOSED_ADDRESS
          value: rels://netbird.company.com:443
        image: netbirdio/relay:0.29.2
        imagePullPolicy: IfNotPresent
        name: netbird-relay
        ports:
        - containerPort: 80
          name: relay
          protocol: TCP

The deployment references a netbird-relay-authkey Secret you need to provide it with a key of your choice.

marcportabellaclotet-mt commented 1 month ago

Relay performance question... I was testing netbird speed using direct connection (opening wg ports) and using relay, and it seems that there is a big performance penalty. Anyone have similars results? Direct connection : 150Mbit speed Using Relay: 30 Mbit speed. I haven't a turn setup, so I can not compare.

mlsmaycon commented 1 month ago

Hey @marcportabellaclotet-mt can you check with different MTU configurations for the NetBird interface on both ends of the connection?

Also, can you share which tool you used for the test?

marcportabellaclotet-mt commented 1 month ago

I am using iperf and speedtest. MTU is 1500 in both sides. Relay app is deployed as a lxc container.

1nerdyguy commented 1 month ago

@marcportabellaclotet-mt Does the relay have adequate upload/download bandwidth? Since all traffic on a relayed connection flows 'through' it, you're limited by the download/upload of the relay. It may also impact the latency between clients, depending how far the relay is from their point of presence.

marcportabellaclotet-mt commented 1 month ago

I am testing the relay service in the same network where netbird client is hostes, so there is no BW restriction. My test setup is: