gravitl / netmaker

Netmaker makes networks with WireGuard. Netmaker automates fast, secure, and distributed virtual networks.
https://netmaker.io
Other
9.4k stars 547 forks source link

could not connect to broker at broker at broker.netmaker.DOMAIN.com:443 [Bug]: #1637

Closed PouriaMzn closed 3 months ago

PouriaMzn commented 1 year ago

Contact Details

No response

What happened?

I have Nginx on my server, and I pass on all requests regarding netmaker to to traefik container. I also have my own wildcard certificates which I provide to traefik container according to this answer.

Here is my docker compose file:

version: "3.4"
services:
  netmaker:
    container_name: netmaker
    image: gravitl/netmaker:v0.16.0
    cap_add: 
      - NET_ADMIN
      - NET_RAW
      - SYS_MODULE
    sysctls:
      - net.ipv4.ip_forward=1
      - net.ipv4.conf.all.src_valid_mark=1
      - net.ipv6.conf.all.disable_ipv6=0
      - net.ipv6.conf.all.forwarding=1
    restart: always
    volumes:
      - dnsconfig:/root/config/dnsconfig
      - sqldata:/root/data
      - shared_certs:/etc/netmaker
    environment:
      SERVER_NAME: "broker.NETMAKER_BASE_DOMAIN"
      SERVER_HOST: "SERVER_PUBLIC_IP"
      SERVER_API_CONN_STRING: "api.NETMAKER_BASE_DOMAIN:443"
      COREDNS_ADDR: "SERVER_PUBLIC_IP"
      DNS_MODE: "on"
      SERVER_HTTP_HOST: "api.NETMAKER_BASE_DOMAIN"
      API_PORT: "8081"
      CLIENT_MODE: "on"
      MASTER_KEY: "REPLACE_MASTER_KEY"
      CORS_ALLOWED_ORIGIN: "*"
      DISPLAY_KEYS: "on"
      DATABASE: "sqlite"
      NODE_ID: "netmaker-server-1"
      MQ_HOST: "mq"
      MQ_PORT: "443"
      MQ_SERVER_PORT: "1883"
      HOST_NETWORK: "off"
      VERBOSITY: "1"
      MANAGE_IPTABLES: "on"
      PORT_FORWARD_SERVICES: "dns"
    ports:
      - "51821-51830:51821-51830/udp"
    expose:
      - "8081"
    labels:
      - traefik.enable=true
      - traefik.http.routers.netmaker-api.entrypoints=websecure
      - traefik.http.routers.netmaker-api.rule=Host(`api.NETMAKER_BASE_DOMAIN`)
      - traefik.http.routers.netmaker-api.service=netmaker-api
      - traefik.http.services.netmaker-api.loadbalancer.server.port=8081
  netmaker-ui:
    container_name: netmaker-ui
    image: gravitl/netmaker-ui:v0.16.0
    depends_on:
      - netmaker
    links:
      - "netmaker:api"
    restart: always
    environment:
      BACKEND_URL: "https://api.NETMAKER_BASE_DOMAIN"
    expose:
      - "80"
    labels:
      - traefik.enable=true
      - traefik.http.middlewares.nmui-security.headers.accessControlAllowOriginList=*.NETMAKER_BASE_DOMAIN
      - traefik.http.middlewares.nmui-security.headers.stsSeconds=31536000
      - traefik.http.middlewares.nmui-security.headers.browserXssFilter=true
      - traefik.http.middlewares.nmui-security.headers.customFrameOptionsValue=SAMEORIGIN
      - traefik.http.middlewares.nmui-security.headers.customResponseHeaders.X-Robots-Tag=none
      - traefik.http.middlewares.nmui-security.headers.customResponseHeaders.Server= # Remove the server name
      - traefik.http.routers.netmaker-ui.entrypoints=websecure
      - traefik.http.routers.netmaker-ui.middlewares=nmui-security@docker
      - traefik.http.routers.netmaker-ui.rule=Host(`dashboard.NETMAKER_BASE_DOMAIN`)
      - traefik.http.routers.netmaker-ui.service=netmaker-ui
      - traefik.http.services.netmaker-ui.loadbalancer.server.port=80
  coredns:
    container_name: coredns
    image: coredns/coredns
    command: -conf /root/dnsconfig/Corefile
    depends_on:
      - netmaker
    restart: always
    volumes:
      - dnsconfig:/root/dnsconfig
  traefik:
    image: traefik:v2.6
    container_name: traefik
    command:
      - "--entrypoints.websecure.address=:443"
      - "--entrypoints.websecure.http.tls=true"
      - "--entrypoints.websecure.http.tls.certResolver=http"
      - "--log.level=INFO"
      - "--providers.docker=true"
      - "--providers.docker.exposedByDefault=false"
      - "--serverstransport.insecureskipverify=true"
    restart: always
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - traefik_certs:/letsencrypt
      - /root/certs/:/etc/certs/
      - /root/certs-traefik.yml:/etc/traefik/dynamic/certs-traefik.yml
    ports:
      - "127.0.0.1:445:443"
  mq:
    container_name: mq
    image: eclipse-mosquitto:2.0.11-openssl
    depends_on:
      - netmaker
    restart: unless-stopped
    volumes:
      - /root/mosquitto.conf:/mosquitto/config/mosquitto.conf
      - mosquitto_data:/mosquitto/data
      - mosquitto_logs:/mosquitto/log
      - shared_certs:/mosquitto/certs
    expose:
      - "8883"
    labels:
      - traefik.enable=true
      - traefik.tcp.routers.mqtts.rule=HostSNI(`broker.NETMAKER_BASE_DOMAIN`)
      - traefik.tcp.routers.mqtts.tls.passthrough=true
      - traefik.tcp.services.mqtts-svc.loadbalancer.server.port=8883
      - traefik.tcp.routers.mqtts.service=mqtts-svc
      - traefik.tcp.routers.mqtts.entrypoints=websecure
volumes:
  traefik_certs: {}
  shared_certs: {}
  sqldata: {}
  dnsconfig: {}
  mosquitto_data: {}
  mosquitto_logs: {}

And here is my Nginx config

server {
    listen 80; listen [::]:80;
    server_name *.netmaker.domain.com;
    return 302 https://$server_name$request_uri;
}

server {
    listen 443 ssl; listen [::]:443 ssl;
    server_name *.netmaker.domain.com;

    ssl_certificate /etc/letsencrypt/live/domain/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/domain/privkey.pem;
    access_log  /var/log/nginx/netmaker.access.log custom;

    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
    proxy_set_header X-Forwarded-Host $host:$server_port;
    proxy_set_header Host $http_host;
    proxy_redirect off;

    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "Upgrade";

    location / {
                proxy_pass https://127.0.0.1:445/;
        }
}

In my setup, dashboard works fine, I'm able to log in and create networks and generate access keys to sign in nodes. but when i add a new node, eventhough it shows the node in the dashboard, I'm not able to ping the node and after a while node shows ERROR status in the dashboard.

This is the log from my node:

[netclient] 2022-10-06 10:52:48 joining dev-net at api.netmaker.DOMAIN.com:443
[netclient] 2022-10-06 10:52:49 network: dev-net node ubuntu-DOMAIN is using port 51821
[netclient] 2022-10-06 10:52:49 network: dev-net certificates/key saved
[netclient] 2022-10-06 10:52:49 starting wireguard
[netclient] 2022-10-06 10:52:52 error running command: systemctl restart netclient.service
[netclient] 2022-10-06 10:52:52
[netclient] Starting netclient daemon
[netclient] 2022-10-06 10:52:52 [daemon.go-49] Daemon(): netclient daemon started -- version: v0.16.0
[netclient] 2022-10-06 10:52:52 [clientconfig.go-23] UpdateClientConfig(): checking for netclient updates...
[netclient] 2022-10-06 10:52:52 [clientconfig.go-52] UpdateClientConfig(): finished updates
[netclient] 2022-10-06 10:52:52 [daemon.go-93] startGoRoutines(): initializing network dev-net
[netclient] 2022-10-06 10:52:52 [daemon.go-108] startGoRoutines(): started daemon for server  broker.netmaker.DOMAIN.com
[netclient] 2022-10-06 10:52:52 [mqpublish.go-32] Checkin(): starting checkin goroutine
[netclient] 2022-10-06 10:52:52 [mqpublish.go-56] checkin(): checkin with server(s) for all networks
[netclient] 2022-10-06 10:52:52 [daemon.go-193] messageQueue(): network: dev-net netclient message queue started for server: broker.netmaker.DOMAIN.com
[netclient] 2022-10-06 10:52:53 [localport.go-40] UpdateLocalListenPort(): network: dev-net local port has changed from  0  to  51821
[netclient] 2022-10-06 10:53:22 [daemon.go-302] setupMQTT(): unable to connect to broker, retrying ...
[netclient] 2022-10-06 10:53:23 [mqpublish.go-255] publish(): could not connect to broker at broker.netmaker.DOMAIN.com:443
[netclient] 2022-10-06 10:53:23 [localport.go-47] UpdateLocalListenPort(): could not publish local port change connection timeout
Ping tcp://broker.netmaker.DOMAIN.com:443(185.143.234.68:443) - Connected - time=6.126431ms
Ping tcp://broker.netmaker.DOMAIN.com:443(185.143.234.68:443) - Connected - time=7.090959ms
Ping tcp://broker.netmaker.DOMAIN.com:443(185.143.234.68:443) - Connected - time=10.34016ms
[netclient] 2022-10-06 10:53:53 [mqpublish.go-255] publish(): could not connect to broker at broker.netmaker.DOMAIN.com:443
[netclient] 2022-10-06 10:53:53 [mqpublish.go-152] Hello(): Network: dev-net error publishing ping, connection timeout
[netclient] 2022-10-06 10:53:53 [mqpublish.go-153] Hello(): running pull on dev-net to reconnect
[netclient] 2022-10-06 10:53:56 [common.go-165] InitWireguard(): waiting for interface...

On my server, this is my mosquitto logs

mq           | 1665051481: Config loaded from /mosquitto/config/mosquitto.conf.
mq           | 1665051481: Opening ipv4 listen socket on port 8883.
mq           | 1665051481: Opening ipv6 listen socket on port 8883.
mq           | 1665051481: Opening ipv4 listen socket on port 1883.
mq           | 1665051481: Opening ipv6 listen socket on port 1883.
mq           | 1665051481: mosquitto version 2.0.11 running

When is try openssl verify -verbose -CAfile /docker/volumes/netmaker_shared_certs/_data/root.pem /docker/volumes/netmaker_shared_certs/_data/server.pem, it says OK.

The problem seems like that nodes cannot connect to broker.netmaker.DOMAIN.com:443 also when I try broker.netmaker.DOMAIN.com:443 in a web browser, I get 400 error from nginx

Bad Request (400)
The plain HTTP request was sent to HTTPS port

and when i try broker.netmaker.DOMAIN.com in a web browser without 443, i get 404 page not found

I tried to run netmaker without traefik and proxy directly from nginx to each container, but I still go the same error.

Any idea how to resolve this ?

Version

v0.16.0

What OS are you using?

Linux

Relevant log output

No response

Contributing guidelines

afeiszli commented 1 year ago

The broker address uses raw TCP. You need to configure Nginx to send TCP traffic to the broker, rather than https.

https://docs.nginx.com/nginx/admin-guide/load-balancer/tcp-udp-load-balancer/

PouriaMzn commented 1 year ago

Thank you for your response. I haven't woked with nginx tcp load balancing yet, but what i did was to change the MQ_PORT: "443" in the docker-compose file to 1443 and added this bloc to nginx config so it sends it to traefik container:

stream {
    server {
        listen     1443;
        proxy_pass 127.0.0.1:445;
    }
}

but I still have error connecting on the node

[netclient] 2022-10-06 14:17:15 [mqpublish.go-56] checkin(): checkin with server(s) for all networks 
[netclient] 2022-10-06 14:17:16 [localport.go-40] UpdateLocalListenPort(): network: dev local port has changed from  0  to  51822 
[netclient] 2022-10-06 14:17:44 [daemon.go-302] setupMQTT(): unable to connect to broker, retrying ... 
Ping tcp://broker.netmaker.DOMAIN.com:1443(37.37.37.37:1443) - Connected - time=151.043823ms
[netclient] 2022-10-06 14:17:46 [mqpublish.go-255] publish(): could not connect to broker at broker.netmaker.DOMAIN.com:1443 
[netclient] 2022-10-06 14:17:46 [localport.go-47] UpdateLocalListenPort(): could not publish local port change connection timeout 
Ping tcp://broker.netmaker.DOMAIN.com:1443(37.37.37.37:1443) - Connected - time=93.122585ms
Ping tcp://broker.netmaker.DOMAIN.com:1443(37.37.37.37:1443) - Connected - time=93.256723ms
[netclient] 2022-10-06 14:18:16 [mqpublish.go-255] publish(): could not connect to broker at broker.netmaker.DOMAIN.com:1443 
[netclient] 2022-10-06 14:18:16 [mqpublish.go-152] Hello(): Network: dev error publishing ping, connection timeout 
[netclient] 2022-10-06 14:18:16 [mqpublish.go-153] Hello(): running pull on dev to DOMAIN

I think I don't know how to setup my nginx tcp stream

PouriaMzn commented 1 year ago

In my second attempt, I disabled traefik for for each container and for the mosquitto container , I didn't used Nginx and opened the port directly and set MQ_PORT: "1443"

  mq:
    container_name: mq
    image: eclipse-mosquitto:2.0.11-openssl
    depends_on:
      - netmaker
    restart: unless-stopped
    volumes:
      - /root/mosquitto.conf:/mosquitto/config/mosquitto.conf
      - mosquitto_data:/mosquitto/data
      - mosquitto_logs:/mosquitto/log
      - shared_certs:/mosquitto/certs
    ports:
      - 127.0.0.1:1883:1883
      - 1443:8883
    expose:
      - "8883"
    labels:
      - traefik.enable=false
      - traefik.tcp.routers.mqtts.rule=HostSNI(`broker.netmaker.domain.com`)
      - traefik.tcp.routers.mqtts.tls.passthrough=true
      - traefik.tcp.services.mqtts-svc.loadbalancer.server.port=8883
      - traefik.tcp.routers.mqtts.service=mqtts-svc
      - traefik.tcp.routers.mqtts.entrypoints=websecure

In my node, these are he logs:

Ping tcp://broker.netmaker.DOMAIN.com:1443(37.32.26.207:1443) - Connected - time=99.642856ms
Ping tcp://broker.netmaker.DOMAIN.com:1443(37.32.26.207:1443) - Connected - time=120.349939ms
Ping tcp://broker.netmaker.DOMAIN.com:1443(37.32.26.207:1443) - Connected - time=94.573264ms
[netclient] 2022-10-06 18:02:13 [mqpublish.go-255] publish(): could not connect to broker at broker.netmaker.DOMAIN.com:1443 
[netclient] 2022-10-06 18:02:13 [mqpublish.go-152] Hello(): Network: dev error publishing ping, connection timeout 
[netclient] 2022-10-06 18:02:13 [mqpublish.go-153] Hello(): running pull on dev to reconnect 
[netclient] 2022-10-06 18:02:14 [mqpublish.go-156] Hello(): could not run pull on dev, error: failed to authenticate 400 Bad Request {"Code":400,"Message":"no result found"} 
[netclient] 2022-10-06 18:02:44 [mqpublish.go-255] publish(): could not connect to broker at broker.netmaker.DOMAIN.com:443 
[netclient] 2022-10-06 18:02:44 [mqpublish.go-152] Hello(): Network: dev-net error publishing ping, connection timeout 
[netclient] 2022-10-06 18:02:44 [mqpublish.go-153] Hello(): running pull on dev-net to reconnect 
[netclient] 2022-10-06 18:02:44 [mqpublish.go-156] Hello(): could not run pull on dev-net, error: failed to authenticate 400 Bad Request {"Code":400,"Message":"no result found"} 
[netclient] 2022-10-06 18:03:15 [mqpublish.go-255] publish(): could not connect to broker at broker.netmaker.DOMAIN.com:1443 
[netclient] 2022-10-06 18:03:15 [mqpublish.go-152] Hello(): Network: test-net error publishing ping, connection timeout 
[netclient] 2022-10-06 18:03:15 [mqpublish.go-153] Hello(): running pull on test-net to reconnect 
[netclient] 2022-10-06 18:03:15 [mqpublish.go-156] Hello(): could not run pull on test-net, error: failed to authenticate 400 Bad Request {"Code":400,"Message":"no result found"} 
[netclient] 2022-10-06 18:03:16 [localport.go-40] UpdateLocalListenPort(): network: wg-vnet local port has changed from  0  to  51824 
[netclient] 2022-10-06 18:03:46 [mqpublish.go-255] publish(): could not connect to broker at broker.netmaker.DOMAIN.com:1443 
[netclient] 2022-10-06 18:03:46 [localport.go-47] UpdateLocalListenPort(): could not publish local port change connection timeout 
[netclient] 2022-10-06 18:04:16 [mqpublish.go-255] publish(): could not connect to broker at broker.netmaker.DOMAIN.com:1443 
[netclient] 2022-10-06 18:04:16 [mqpublish.go-152] Hello(): Network: wg-vnet error publishing ping, connection timeout 
[netclient] 2022-10-06 18:04:16 [mqpublish.go-153] Hello(): running pull on wg-vnet to reconnect 

and on the server:

mq           | 1665079299: New connection from 11.11.11.11:41912 on port 8883.
mq           | 1665079333: New connection from 11.11.11.11:47656 on port 8883.
netmaker     | [netmaker] 2022-10-06 18:02:13  failed to get node info [5c6b0041-4bcc-49d9-9619-e92167655b51]: no result found 
netmaker     | [netmaker] 2022-10-06 18:02:13 processed request error: no result found 
netmaker     | [netmaker] 2022-10-06 18:02:18  failed to get node info [eb9d2410-1dff-4f9f-9e4c-084285e3c282]: no result found 
netmaker     | [netmaker] 2022-10-06 18:02:18 processed request error: no result found 
netmaker     | [netmaker] 2022-10-06 18:02:26  failed to get node info [f478fc96-8eff-4217-a69f-3db061c61685]: no result found 
netmaker     | [netmaker] 2022-10-06 18:02:26 processed request error: no result found 
netmaker     | [netmaker] 2022-10-06 18:02:26  failed to get node info [f478fc96-8eff-4217-a69f-3db061c61685]: no result found 
netmaker     | [netmaker] 2022-10-06 18:02:26 processed request error: no result found 
netmaker     | [netmaker] 2022-10-06 18:02:44  failed to get node info [eb9d2410-1dff-4f9f-9e4c-084285e3c282]: no result found 
netmaker     | [netmaker] 2022-10-06 18:02:44 processed request error: no result found 
mq           | 1665079367: New connection from 11.11.11.11:58926 on port 8883.
netmaker     | [netmaker] 2022-10-06 18:02:48  failed to get node info [634516a6-beab-4226-ab82-45c4a0aec807]: no result found 
netmaker     | [netmaker] 2022-10-06 18:02:48 processed request error: no result found 
mq           | 1665079389: Client <unknown> has exceeded timeout, disconnecting.
netmaker     | [netmaker] 2022-10-06 18:03:15  failed to get node info [634516a6-beab-4226-ab82-45c4a0aec807]: no result found 
netmaker     | [netmaker] 2022-10-06 18:03:15 processed request error: no result found 
netmaker     | [netmaker] 2022-10-06 18:03:19  failed to get node info [5c6b0041-4bcc-49d9-9619-e92167655b51]: no result found 
netmaker     | [netmaker] 2022-10-06 18:03:19 processed request error: no result found 
mq           | 1665079401: New connection from 11.11.11.11:58308 on port 8883.
netmaker     | [netmaker] 2022-10-06 18:03:26  failed to get node info [f478fc96-8eff-4217-a69f-3db061c61685]: no result found 
netmaker     | [netmaker] 2022-10-06 18:03:26 processed request error: no result found 
mq           | 1665079425: Client <unknown> has exceeded timeout, disconnecting.
netmaker     | [netmaker] 2022-10-06 18:03:50  failed to get node info [eb9d2410-1dff-4f9f-9e4c-084285e3c282]: no result found 
netmaker     | [netmaker] 2022-10-06 18:03:50 processed request error: no result found 
mq           | 1665079435: New connection from 11.11.11.11:37610 on port 8883.

I tried to troubleshooting mosquitto from this tutorial, everything seems to be fine, the only thing I haven't been able to do is to delete the certs with sqlite3 /var/lib/docker/volumes/root_sqldata/_data/netmaker.db 'delete from certs since there is no sqlite on server or in the netmaker container.

mattkasun commented 1 year ago

v0.16.1 (currently a pre-release version) updates the mq security model and will simplify mq connections between the clients and server.

PouriaMzn commented 1 year ago

I updated netmaker to v.016.1 but the problem still exists the mq is not proxied and is listening on port 1443 on the server i get

mq           | 1665840595: mosquitto version 2.0.11 starting
mq           | 1665840595: Config loaded from /mosquitto/config/mosquitto.conf.
mq           | 1665840595: Loading plugin: /usr/lib/mosquitto_dynamic_security.so
mq           | 1665840595: Opening ipv4 listen socket on port 8883.
mq           | 1665840595: Opening ipv6 listen socket on port 8883.
mq           | 1665840595: Opening ipv4 listen socket on port 1883.
mq           | 1665840595: Opening ipv6 listen socket on port 1883.
mq           | 1665840595: mosquitto version 2.0.11 running
mq           | 1665840597: New connection from 172.24.0.3:38988 on port 1883.
mq           | 1665840597: New connection from 172.24.0.3:38998 on port 1883.
mq           | 1665840597: New client connected from 172.24.0.3:38988 as uCvTiQjVXOODjOZrrMwECmp (p2, c1, k60, u'Netmaker-Server').
mq           | 1665840597: New client connected from 172.24.0.3:38998 as Yenu8PixarVSJ2ybj3u4hCG (p2, c1, k60, u'Netmaker-Admin').
mq           | 1665840598: New connection from 11.11.11.11:52448 on port 8883.
mq           | 1665840632: New connection from 11.11.11.11:52450 on port 8883.
mq           | 1665840669: New connection from 11.11.11.11:52452 on port 8883.
mq           | 1665840690: Client <unknown> has exceeded timeout, disconnecting.
mq           | 1665840700: New connection from 11.11.11.11:52454 on port 8883.

and on the client side i get

[netclient] 2022-10-15 17:04:10 [mqpublish.go-252] publish(): could not connect to broker at broker.netmaker.DOMAIN.com:1443
[netclient] 2022-10-15 17:04:10 [localport.go-47] UpdateLocalListenPort(): could not publish local port change connection timeout
[netclient] 2022-10-15 17:04:40 [mqpublish.go-252] publish(): could not connect to broker at broker.netmaker.DOMAIN.com:1443
[netclient] 2022-10-15 17:04:40 [mqpublish.go-149] Hello(): Network: lan-party error publishing ping, connection timeout
[netclient] 2022-10-15 17:04:40 [mqpublish.go-150] Hello(): running pull on lan-party to reconnect
[netclient] 2022-10-15 17:04:43 [common.go-389] informPortChange(): network: lan-party UDP hole punching enabled for node id-615312
[netclient] 2022-10-15 17:04:45 [common.go-165] InitWireguard(): waiting for interface...
[netclient] 2022-10-15 17:04:45 [common.go-193] InitWireguard(): interface ready - netclient.. ENGAGE
[netclient] 2022-10-15 17:04:45 [mqpublish.go-52] checkin(): checkin with server(s) for all networks
[netclient] 2022-10-15 17:04:45 [localport.go-40] UpdateLocalListenPort(): network: lan-party local port has changed from  0  to  39232

how is it that mq receives the connection but shows the Client <unknown> has exceeded timeout ?

mattkasun commented 1 year ago

Your client logs show that you are trying to connect to broker at port 1443 but your sever logs show that mq server is listening on port 8883 (and 1883 for server connections)

PouriaMzn commented 1 year ago

because my container's 8883 is connected to 1443 on the host

  mq:
    container_name: mq
    image: eclipse-mosquitto:2.0.11-openssl
    depends_on:
      - netmaker
    restart: unless-stopped
    command: ["/mosquitto/config/wait.sh"]
    environment:
      NETMAKER_SERVER_HOST: "https://api.netmaker.DOMAIN.com"
    volumes:
      - /root/mosquitto.conf:/mosquitto/config/mosquitto.conf
      - /root/wait.sh:/mosquitto/config/wait.sh
      - mosquitto_data:/mosquitto/data
      - mosquitto_logs:/mosquitto/log
    ports:
      - "1443:8883"
abhishek9686 commented 1 year ago

Since you are not using a proxy, the issue here is current clients connect using mqtts to connect securely to MQ via traefik but in your case it's an insecure connection in which the client will have to use mqtt to connect to MQ, that's why it's failing

helbgd commented 1 year ago

would it not be possible if you enable websocket support in mosquitto.conf

per_listener_settings no

listener 8883 allow_anonymous false protocol websocket <--

listener 1883 allow_anonymous false

plugin /usr/lib/mosquitto_dynamic_security.so plugin_opt_config_file /mosquitto/data/dynamic-security.json

once netclient support's websocket connections thenn it would also be possible to use other reverse proxies for mqtt proying than traefik

PouriaMzn commented 1 year ago

at the moment I'm facing another problem, for some reason that I cannot figure out myself, traefik cannot find the netmaker container Ip address, I'm using the standard docker-compose.yml file provided in the documentation, and I get this log output from the docker compose up

mq           | Waiting for netmaker server to startup
mq           | Waiting for netmaker server to startup
mq           | Waiting for netmaker server to startup
mq           | Waiting for netmaker server to startup
mq           | Waiting for netmaker server to startup
netmaker     | [netmaker] Fatal: Admin: could not connect to broker, token timeout, exiting ... 
netmaker exited with code 2
traefik      | time="2022-10-20T14:56:11Z" level=error msg="service \"netmaker-api\" error: unable to find the IP address for the container \"/netmaker\": the server is ignored" container=netmaker-netmaker-7b1ecc14bbb4521fa32fad835fa240699e8d5274e10ef22a20a25df3de8bab73 providerName=docker
mq           | Waiting for netmaker server to startup
netmaker     |               
netmaker     |  __   __     ______     ______   __    __     ______     __  __     ______     ______    
netmaker     | /\ "-.\ \   /\  ___\   /\__  _\ /\ "-./  \   /\  __ \   /\ \/ /    /\  ___\   /\  == \   
netmaker     | \ \ \-.  \  \ \  __\   \/_/\ \/ \ \ \-./\ \  \ \  __ \  \ \  _"-.  \ \  __\   \ \  __<   
netmaker     |  \ \_\\"\_\  \ \_____\    \ \_\  \ \_\ \ \_\  \ \_\ \_\  \ \_\ \_\  \ \_____\  \ \_\ \_\ 
netmaker     |   \/_/ \/_/   \/_____/     \/_/   \/_/  \/_/   \/_/\/_/   \/_/\/_/   \/_____/   \/_/ /_/ 
netmaker     |                                                                                                                                                                                                                                                       
netmaker     | 
netmaker     | [netmaker] 2022-10-20 14:53:58 connecting to sqlite 
netmaker     | [netmaker] 2022-10-20 14:53:58 database successfully connected 
netmaker     | [netmaker] 2022-10-20 14:53:58 no OAuth provider found or not configured, continuing without OAuth 
netmaker     | [netmaker] 2022-10-20 14:53:58 MQ Is Already Configured, Skipping... 
netmaker     | [netmaker] 2022-10-20 14:54:00 REST Server successfully started on port  8081  (REST) 
netmaker     | [netmaker] 2022-10-20 14:54:00 connecting to mq broker at mq:1883 with TLS? false 
netmaker     | [netmaker] Fatal: Admin: could not connect to broker, token timeout, exiting ... 
schniggie commented 1 year ago

Just tried a fresh install and get exactly the same issue. netmaker cat not successfully connect to mq:1883. mq is running fine @PouriaMzn And this should not be related to traefik as mq:1883 addresses the container directly as container name based resolution works in custom docker networks, Here is my log:

/opt/containers/netmaker # docker-compose up
[+] Running 4/0
 ⠿ Container netmaker     Created                                                                                                 0.0s
 ⠿ Container netmaker-ui  Created                                                                                                 0.0s
 ⠿ Container coredns      Created                                                                                                 0.0s
 ⠿ Container mq           Created                                                                                                 0.0s
Attaching to coredns, mq, netmaker, netmaker-ui
netmaker     |               
netmaker     |  __   __     ______     ______   __    __     ______     __  __     ______     ______    
netmaker     | /\ "-.\ \   /\  ___\   /\__  _\ /\ "-./  \   /\  __ \   /\ \/ /    /\  ___\   /\  == \   
netmaker     | \ \ \-.  \  \ \  __\   \/_/\ \/ \ \ \-./\ \  \ \  __ \  \ \  _"-.  \ \  __\   \ \  __<   
netmaker     |  \ \_\\"\_\  \ \_____\    \ \_\  \ \_\ \ \_\  \ \_\ \_\  \ \_\ \_\  \ \_____\  \ \_\ \_\ 
netmaker     |   \/_/ \/_/   \/_____/     \/_/   \/_/  \/_/   \/_/\/_/   \/_/\/_/   \/_____/   \/_/ /_/ 
netmaker     |                                                                                                                         
netmaker     | 
netmaker     | [netmaker] 2022-10-26 22:06:00 connecting to sqlite 
netmaker     | [netmaker] 2022-10-26 22:06:00 database successfully connected 
netmaker     | [netmaker] 2022-10-26 22:06:00 no OAuth provider found or not configured, continuing without OAuth 
netmaker     | [netmaker] 2022-10-26 22:06:01 MQ Is Already Configured, Skipping... 
netmaker     | [netmaker] 2022-10-26 22:06:01 REST Server successfully started on port  8081  (REST) 
netmaker-ui  | >>>> backend set to: https://api.vpn.<removed>.de <<<<<
netmaker-ui  | 2022/10/26 22:06:01 [notice] 9#9: using the "epoll" event method
netmaker-ui  | 2022/10/26 22:06:01 [notice] 9#9: nginx/1.21.6
netmaker-ui  | 2022/10/26 22:06:01 [notice] 9#9: built by gcc 10.3.1 20211027 (Alpine 10.3.1_git20211027) 
netmaker-ui  | 2022/10/26 22:06:01 [notice] 9#9: OS: Linux 4.19.0-22-amd64
netmaker-ui  | 2022/10/26 22:06:01 [notice] 9#9: getrlimit(RLIMIT_NOFILE): 1048576:1048576
netmaker-ui  | 2022/10/26 22:06:01 [notice] 9#9: start worker processes
netmaker-ui  | 2022/10/26 22:06:01 [notice] 9#9: start worker process 10
netmaker-ui  | 2022/10/26 22:06:01 [notice] 9#9: start worker process 11
netmaker-ui  | 2022/10/26 22:06:01 [notice] 9#9: start worker process 12
netmaker-ui  | 2022/10/26 22:06:01 [notice] 9#9: start worker process 13
coredns      | .:53
coredns      | [INFO] plugin/reload: Running configuration SHA512 = 7756a7dc8ca1f3bff79c30438b6f47e299df3244c7e5d9978507f7380aeee497c25205a60a8a1ee18ece118df5ef351e6365e14acf9cf2120fa4ffd783ff5ae2
coredns      | CoreDNS-1.10.0
coredns      | linux/amd64, go1.19.1, 596a9f9
netmaker     | [netmaker] 2022-10-26 22:06:01 connecting to mq broker at mq:1883 with TLS? false 
mq           | OK: 7 MiB in 18 packages
mq           | SERVER: https://api.vpn.<removed>.de
netmaker     | [netmaker] Fatal: Admin: could not connect to broker, token timeout, exiting ... 
mq           | Waiting for netmaker server to startup
netmaker exited with code 2
schniggie commented 1 year ago

Just tried a fresh install and get exactly the same issue. netmaker cat not successfully connect to mq:1883. mq is running fine @PouriaMzn And this should not be related to traefik as mq:1883 addresses the container directly as container name based resolution works in custom docker networks, Here is my log:

/opt/containers/netmaker # docker-compose up
[+] Running 4/0
 ⠿ Container netmaker     Created                                                                                                 0.0s
 ⠿ Container netmaker-ui  Created                                                                                                 0.0s
 ⠿ Container coredns      Created                                                                                                 0.0s
 ⠿ Container mq           Created                                                                                                 0.0s
Attaching to coredns, mq, netmaker, netmaker-ui
netmaker     |               
netmaker     |  __   __     ______     ______   __    __     ______     __  __     ______     ______    
netmaker     | /\ "-.\ \   /\  ___\   /\__  _\ /\ "-./  \   /\  __ \   /\ \/ /    /\  ___\   /\  == \   
netmaker     | \ \ \-.  \  \ \  __\   \/_/\ \/ \ \ \-./\ \  \ \  __ \  \ \  _"-.  \ \  __\   \ \  __<   
netmaker     |  \ \_\\"\_\  \ \_____\    \ \_\  \ \_\ \ \_\  \ \_\ \_\  \ \_\ \_\  \ \_____\  \ \_\ \_\ 
netmaker     |   \/_/ \/_/   \/_____/     \/_/   \/_/  \/_/   \/_/\/_/   \/_/\/_/   \/_____/   \/_/ /_/ 
netmaker     |                                                                                                                         
netmaker     | 
netmaker     | [netmaker] 2022-10-26 22:06:00 connecting to sqlite 
netmaker     | [netmaker] 2022-10-26 22:06:00 database successfully connected 
netmaker     | [netmaker] 2022-10-26 22:06:00 no OAuth provider found or not configured, continuing without OAuth 
netmaker     | [netmaker] 2022-10-26 22:06:01 MQ Is Already Configured, Skipping... 
netmaker     | [netmaker] 2022-10-26 22:06:01 REST Server successfully started on port  8081  (REST) 
netmaker-ui  | >>>> backend set to: https://api.vpn.<removed>.de <<<<<
netmaker-ui  | 2022/10/26 22:06:01 [notice] 9#9: using the "epoll" event method
netmaker-ui  | 2022/10/26 22:06:01 [notice] 9#9: nginx/1.21.6
netmaker-ui  | 2022/10/26 22:06:01 [notice] 9#9: built by gcc 10.3.1 20211027 (Alpine 10.3.1_git20211027) 
netmaker-ui  | 2022/10/26 22:06:01 [notice] 9#9: OS: Linux 4.19.0-22-amd64
netmaker-ui  | 2022/10/26 22:06:01 [notice] 9#9: getrlimit(RLIMIT_NOFILE): 1048576:1048576
netmaker-ui  | 2022/10/26 22:06:01 [notice] 9#9: start worker processes
netmaker-ui  | 2022/10/26 22:06:01 [notice] 9#9: start worker process 10
netmaker-ui  | 2022/10/26 22:06:01 [notice] 9#9: start worker process 11
netmaker-ui  | 2022/10/26 22:06:01 [notice] 9#9: start worker process 12
netmaker-ui  | 2022/10/26 22:06:01 [notice] 9#9: start worker process 13
coredns      | .:53
coredns      | [INFO] plugin/reload: Running configuration SHA512 = 7756a7dc8ca1f3bff79c30438b6f47e299df3244c7e5d9978507f7380aeee497c25205a60a8a1ee18ece118df5ef351e6365e14acf9cf2120fa4ffd783ff5ae2
coredns      | CoreDNS-1.10.0
coredns      | linux/amd64, go1.19.1, 596a9f9
netmaker     | [netmaker] 2022-10-26 22:06:01 connecting to mq broker at mq:1883 with TLS? false 
mq           | OK: 7 MiB in 18 packages
mq           | SERVER: https://api.vpn.<removed>.de
netmaker     | [netmaker] Fatal: Admin: could not connect to broker, token timeout, exiting ... 
mq           | Waiting for netmaker server to startup
netmaker exited with code 2

Ok fixed it for myself. my traefik setup was broken, missed to put netmaker in correct traedik network-

DolevBaron commented 1 year ago

@schniggie I'm currently facing the same issue ("could not connect to broker"), can you elaborate on your solution? A copy of the relevant file (e.g the matching section in docker-compose.yml) before and after the change will be really helpful

schniggie commented 1 year ago

@DolevBaron TLDR: For me the solutions was quite simple. As I already have Traefik in use, I removed the Traefik container from this docker-compose, however I missed to place all containers in the Traefik bridge network and therefore Traefik was not able to serve the services correctly. Your API needs to be successfully exposed, because the mq container only starts if he is able to reach the API through the external exposed Traefik route. Otherwise mq wait for ever, the mq deamon is not started and finally you get the error that Netmaker cannot connect to mq. Hope that helps :)

Orybon commented 1 year ago

Hi,

I use gravitl/netmaker:v0.17.1 and got the same error :

Feb 26 08:36:40 [SNIP] netclient[201867]: [netclient] 2023-02-26 08:36:40 [mqpublish.go-252] publish(): could not connect to broker at <broker_dns>:443
Feb 26 08:36:40 [SNIP] netclient[201867]: [netclient] 2023-02-26 08:36:40 [localport.go-47] UpdateLocalListenPort(): could not publish local port change connection timeout

but when i tried to connect from the same host i got a success :

 ~ nc -zv <broker_dns> 443
Connection to <broker_dns> 443 port [tcp/https] succeeded!

I don't understand what the issue here. Any chance you provide your working docker-compose.yaml and mosquitto.conf using traefik please ?