gravitl / netmaker

Netmaker makes networks with WireGuard. Netmaker automates fast, secure, and distributed virtual networks.
https://netmaker.io
Other
9.4k stars 547 forks source link

[Bug]: Core DSN disabled - does it work #1720

Open pnowy opened 1 year ago

pnowy commented 1 year ago

Contact Details

in the ticket

What happened?

Hi,

it's a great product I'm struggling with the DNS.

We have a pure WireGuard setup in GCP with the internal zone and DNS forwarding IP for that zone. In pure WireGuard config like this:

[Interface]
Address = 100.99.98.0/24
ListenPort = 51820
PrivateKey = MyPrivateKey
MTU = 1450
PostUp = iptables -A FORWARD -i wg0 -j ACCEPT; iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
PostDown = iptables -D FORWARD -i wg0  -j ACCEPT; iptables -t nat -D POSTROUTING -o eth0 -j MASQUERADE

And for ext client we have just setup to point for our DNS IP (lets say DNS=A.B.C.D) and everything works fine. The external clients are able to use internal zone without any issues.

I've tried to setup Netmaker and the network connectivity working without any issues almost out-of-the-box (of course I setup node as ingress/egress with CIRD-s blocks) but I spend a day and cannot make that internal DNS working.

I've checked different options:

So I get to conclusion that I will disable the CoreDNS and keep only DNS forwarding and that solve the issue according to the documentation (https://docs.netmaker.org/server-installation.html?highlight=dns#no-dns-coredns-disabled).

I've commented out from docker compose the coredns part (tried with/without removed netmaker volume mounting - dnsconfig:/root/config/dnsconfig) but in all cases on Netmaker I get the error:

Unable to initialize iptables on host: lookup coredns: Try again

The DNS in machine itself (who is a ingress/egress for Netmaker) works fine and recognizes the internal DNS from GCP.

@afeiszli is there a way to disable the Core DSN at all and use fully DNS querying from existed internal zones?

The related tickets about DNS I found so I think other people are confused about DNS also:

I could make a documentation update but need to first handle the issue.

PS. The internal Netmaker DNS entries based on embedded CoreDNS are working for the clients but in this moment our internal DNS is more important for us and it seems that the CoreDNS have a priority here (?)

I was able to disable CoreDNS with DNS_MODE: "on" but I would like to keep private DNS from our internal networks.

Version

v0.16.1

What OS are you using?

Linux

Relevant log output

No response

Contributing guidelines

mattkasun commented 1 year ago

did you remove dns from PORT_FORWARD_SERVICES in the docker-compose file?

pnowy commented 1 year ago

@mattkasun I've tried with commented out that line in docker-compose (to be precise, I haven't tried empty environment variable if this is matter). Then as I remember in the external client get the same issue (no DNS resolved) but haven't seen then the logs in coredns docker container when tried to execute dig against my A.B.C.D DNS server directly.

I could double check tomorrow. So expected check is:

Is that correct?

pnowy commented 1 year ago

@mattkasun to be precise I'm trying the following docker-compose.yml file (Ubuntu server, some URL and passwords replaced of course in real life example):

version: "3.4"

services:
  netmaker:
    container_name: netmaker
    image: gravitl/netmaker:v0.16.1
    cap_add: 
      - NET_ADMIN
      - NET_RAW
      - SYS_MODULE
    sysctls:
      - net.ipv4.ip_forward=1
      - net.ipv4.conf.all.src_valid_mark=1
      - net.ipv6.conf.all.disable_ipv6=0
      - net.ipv6.conf.all.forwarding=1
    restart: always
    volumes:
      - dnsconfig:/root/config/dnsconfig
      - sqldata:/root/data
      - mosquitto_data:/etc/netmaker
    environment:
      SERVER_NAME: "broker.netmaker.myapi.com"
      SERVER_HOST: "<PUBLIC_IP>"
      SERVER_API_CONN_STRING: "api.netmaker.myapi.com:443"
      COREDNS_ADDR: "<PUBLIC_IP>"
      DNS_MODE: "on"
      SERVER_HTTP_HOST: "api.netmaker.myapi.com"
      API_PORT: "8081"
      CLIENT_MODE: "on"
      MASTER_KEY: "<MASTER_KEY>"
      CORS_ALLOWED_ORIGIN: "*"
      DISPLAY_KEYS: "on"
      DATABASE: "sqlite"
      NODE_ID: "netmaker-gcp"
      MQ_HOST: "mq"
      MQ_PORT: "443"
      MQ_SERVER_PORT: "1883"
      HOST_NETWORK: "off"
      VERBOSITY: "1"
      MANAGE_IPTABLES: "on"
      PORT_FORWARD_SERVICES: "" # tried also with commented out that line at all but without the effect
      MQ_ADMIN_PASSWORD: "<MQ_ADMIN_PASSWORD>"
    ports:
      - "51821-51830:51821-51830/udp"
    expose:
      - "8081"
    labels:
      - traefik.enable=true
      - traefik.http.routers.netmaker-api.entrypoints=websecure
      - traefik.http.routers.netmaker-api.rule=Host(`api.netmaker.myapi.com`)
      - traefik.http.routers.netmaker-api.service=netmaker-api
      - traefik.http.services.netmaker-api.loadbalancer.server.port=8081
  netmaker-ui:
    container_name: netmaker-ui
    image: gravitl/netmaker-ui:v0.16.1
    depends_on:
      - netmaker
    links:
      - "netmaker:api"
    restart: always
    environment:
      BACKEND_URL: "https://api.netmaker.myapi.com"
    expose:
      - "80"
    labels:
      - traefik.enable=true
      - traefik.http.middlewares.nmui-security.headers.accessControlAllowOriginList=*.netmaker.myapi.com
      - traefik.http.middlewares.nmui-security.headers.stsSeconds=31536000
      - traefik.http.middlewares.nmui-security.headers.browserXssFilter=true
      - traefik.http.middlewares.nmui-security.headers.customFrameOptionsValue=SAMEORIGIN
      - traefik.http.middlewares.nmui-security.headers.customResponseHeaders.X-Robots-Tag=none
      - traefik.http.middlewares.nmui-security.headers.customResponseHeaders.Server= # Remove the server name
      - traefik.http.routers.netmaker-ui.entrypoints=websecure
      - traefik.http.routers.netmaker-ui.middlewares=nmui-security@docker
      - traefik.http.routers.netmaker-ui.rule=Host(`dashboard.netmaker.mypi.com`)
      - traefik.http.routers.netmaker-ui.service=netmaker-ui
      - traefik.http.services.netmaker-ui.loadbalancer.server.port=80
  coredns:
    container_name: coredns
    image: coredns/coredns
    command: -conf /root/dnsconfig/Corefile
    depends_on:
      - netmaker
    restart: always
    volumes:
      - dnsconfig:/root/dnsconfig
  traefik:
    image: traefik:v2.6
    container_name: traefik
    command:
      - "--certificatesresolvers.http.acme.email=my.email@myemail.com"
      - "--certificatesresolvers.http.acme.storage=/letsencrypt/acme.json"
      - "--certificatesresolvers.http.acme.tlschallenge=true"
      - "--entrypoints.websecure.address=:443"
      - "--entrypoints.websecure.http.tls=true"
      - "--entrypoints.websecure.http.tls.certResolver=http"
      - "--log.level=INFO"
      - "--providers.docker=true"
      - "--providers.docker.exposedByDefault=false"
      - "--serverstransport.insecureskipverify=true"
    restart: always
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - traefik_certs:/letsencrypt
    ports:
      - "443:443"
  mq:
    container_name: mq
    image: eclipse-mosquitto:2.0.11-openssl
    depends_on:
      - netmaker
    restart: unless-stopped
    command: ["/mosquitto/config/wait.sh"]
    environment:
      NETMAKER_SERVER_HOST: "https://api.netmaker.myapi.com"
    volumes:
      - /root/mosquitto.conf:/mosquitto/config/mosquitto.conf
      - /root/wait.sh:/mosquitto/config/wait.sh
      - mosquitto_data:/mosquitto/data
      - mosquitto_logs:/mosquitto/log
    expose:
      - "8883"
    labels:
      - traefik.enable=true
      - traefik.tcp.routers.mqtt.rule=HostSNI(`broker.netmaker.myapi.com`)
      - traefik.tcp.routers.mqtt.tls.certresolver=http
      - traefik.tcp.services.mqtt.loadbalancer.server.port=8883
      - traefik.tcp.routers.mqtt.entrypoints=websecure
volumes:
  traefik_certs: {}
  sqldata: {}
  dnsconfig: {}
  mosquitto_data: {}
  mosquitto_logs: {}

So tried with PORT_FORWARD_SERVICES commented or empty - in both cases the same result - DNS forwarding doesn't work.

In the GCP node the command dig @A.B.C.D something.myapi-internal.com works fine but from external client with DNS line configured wit A.B.C.D DNS server name doesn't work (the DNS CIDR range included in allowed IPs).

If you have any recommendation to check I will be happy to check them but I think in this case the CoreDNS still working (at least from UI Netmaker perspective) and just wondering whether the CoreDNS full disabling help - but I mentioned before - commented out from docker-compose generate issue on netmaker container side.

axute commented 1 year ago

same issue here, looks like any DNS traefik is blocked complete, if DNS_MODE=off, also tried with empty PORT_FORWARD_SERVICES="" and without PORT_FORWARD_SERVICES

Gamechiefx commented 1 year ago

Is this something that is going to be fixed?? At this time I am unable to domain join machines because the client machines are unable to execute DNS queries to the domain controller

pnowy commented 1 year ago

@Gamechiefx I wasn't able to figure out why DNS is not working and without knowing the tool is quite hard to handle the issue. After all I've switched to another project called NetBird (https://netbird.io/). The DNS has been added there recently and it working for me quite nice (it's also free for self-hosted deployment and has Keycloak/OpenID integration in open source version).

Nathan13888 commented 1 year ago

Same problem for me right now...

Nathan13888 commented 1 year ago

Though I think I managed to fix it by adding a networks section to all the containers in docker compose and adding them all to the same Docker bridge network.

pnowy commented 1 year ago

@Nathan13888 does it work? I'm surprised because the bridge network is docker default and by lack of it docker compose creates one. So in case when docker compose is on /root directory the created network is root_default with bridge type.

So empty network in above case it's something like this:

  netmaker:
    container_name: netmaker
    networks:
      - root_default
Nathan13888 commented 1 year ago

@pnowy It nevered worked for me unfortunately...

I initially spent at least a dozen hours working on it but never got it working after reading basically the entire documentation and trying different setups.

pnowy commented 1 year ago

The same for me, returned to the subject after some time and with newer fresh installed version but still the same issue.