qdm12 / dns

Docker DNS server on steroids to access DNS-over-TLS from Cloudflare, Google, Quad9, Quadrant or CleanBrowsing
https://hub.docker.com/r/qmcgaw/cloudflare-dns-server
MIT License
214 stars 37 forks source link

unbound fails to start due to not being able to talk to github. #31

Closed evanrich closed 4 years ago

evanrich commented 4 years ago

just fired this up in kubernetes, and got the following error

========= DNS over TLS container ========
=========================================
=========================================
=== Made with ❤  by github.com/qdm12 ====
=========================================

Running version latest built on 2020-03-20t01:31:43z (commit 4506edb)

📣  Supports IPv6 DNS resolution

🔧  Need help? https://github.com/qdm12/cloudflare-dns-server/issues/new
💻  Email? quentin.mcgaw@gmail.com
☕  Slack? Join from the Slack button on Github
💸  Help me? https://github.com/sponsors/qdm12
2020-03-21T19:43:41.032Z        INFO    Unbound version: 1.10.0
2020-03-21T19:43:41.033Z        INFO    Settings summary:
DNS over TLS provider:
|--cloudflare
Listening port: 53
Caching: enabled
Verbosity level: 3/5
Verbosity details level: 0/4
Validation log level: 0/2
Block malicious: enabled
Block surveillance: disabed
Block ads: disabed
Blocked hostnames:
Blocked IP addresses:
Allowed hostnames:
Private addresses:
 |--127.0.0.1/8
 |--10.0.0.0/8
 |--172.16.0.0/12
 |--192.168.0.0/16
 |--169.254.0.0/16
 |--::1/128
 |--fc00::/7
 |--fe80::/10
 |--::ffff:0:0/96
2020-03-21T19:43:41.033Z        INFO    using DNS address 1.1.1.1 internally
2020-03-21T19:43:41.033Z        INFO    downloading root hints from https://raw.githubusercontent.com/qdm12/files/master/named.root.updated
2020-03-21T19:43:41.243Z        INFO    downloading root key from https://raw.githubusercontent.com/qdm12/files/master/root.key.updated
2020-03-21T19:43:41.264Z        INFO    generating Unbound configuration
2020-03-21T19:43:42.195Z        INFO    116793 hostnames blocked overall
2020-03-21T19:43:42.195Z        INFO    182459 IP addresses blocked overall
2020-03-21T19:43:42.805Z        INFO    starting unbound
2020-03-21T19:43:42.806Z        INFO    using DNS address 127.0.0.1 internally
2020-03-21T19:43:42.807Z        WARN    could not resolve github.com (try 1 of 10): lookup github.com on 10.96.0.10:53: read udp 127.0.0.1:33612->127.0.0.1:53: read: connection refused
2020-03-21T19:43:43.309Z        WARN    could not resolve github.com (try 2 of 10): lookup github.com on 10.96.0.10:53: read udp 127.0.0.1:49110->127.0.0.1:53: read: connection refused
2020-03-21T19:43:43.507Z        INFO    unbound: [1584819823] unbound[23:0] debug: switching log to stderr
2020-03-21T19:43:44.514Z        INFO    unbound: [1584819824] unbound[23:0] debug: module config: "validator iterator"
2020-03-21T19:43:44.514Z        INFO    unbound: [1584819824] unbound[23:0] notice: init module 0: validator
2020-03-21T19:43:44.515Z        INFO    unbound: [1584819824] unbound[23:0] info: adding trusted key . DS IN
2020-03-21T19:43:44.515Z        INFO    unbound: [1584819824] unbound[23:0] info: adding trusted key . DS IN
2020-03-21T19:43:44.515Z        INFO    unbound: [1584819824] unbound[23:0] notice: init module 1: iterator
2020-03-21T19:43:44.515Z        INFO    unbound: [1584819824] unbound[23:0] debug: target fetch policy for level 0 is 3
2020-03-21T19:43:44.515Z        INFO    unbound: [1584819824] unbound[23:0] debug: target fetch policy for level 1 is 2
2020-03-21T19:43:44.515Z        INFO    unbound: [1584819824] unbound[23:0] debug: target fetch policy for level 2 is 1
2020-03-21T19:43:44.515Z        INFO    unbound: [1584819824] unbound[23:0] debug: target fetch policy for level 3 is 0
2020-03-21T19:43:44.515Z        INFO    unbound: [1584819824] unbound[23:0] debug: target fetch policy for level 4 is 0
2020-03-21T19:43:48.811Z        WARN    could not resolve github.com (try 3 of 10): lookup github.com on 10.96.0.10:53: read udp 127.0.0.1:57753->127.0.0.1:53: read: connection refused
2020-03-21T19:43:49.312Z        WARN    could not resolve github.com (try 4 of 10): lookup github.com on 10.96.0.10:53: read udp 127.0.0.1:42031->127.0.0.1:53: read: connection refused
2020-03-21T19:43:49.813Z        WARN    could not resolve github.com (try 5 of 10): lookup github.com on 10.96.0.10:53: read udp 127.0.0.1:42936->127.0.0.1:53: read: connection refused
2020-03-21T19:43:50.315Z        WARN    could not resolve github.com (try 6 of 10): lookup github.com on 10.96.0.10:53: read udp 127.0.0.1:60121->127.0.0.1:53: read: connection refused
2020-03-21T19:43:50.816Z        WARN    could not resolve github.com (try 7 of 10): lookup github.com on 10.96.0.10:53: read udp 127.0.0.1:38169->127.0.0.1:53: read: connection refused
2020-03-21T19:43:51.318Z        WARN    could not resolve github.com (try 8 of 10): lookup github.com on 10.96.0.10:53: read udp 127.0.0.1:50918->127.0.0.1:53: read: connection refused
2020-03-21T19:43:51.819Z        WARN    could not resolve github.com (try 9 of 10): lookup github.com on 10.96.0.10:53: read udp 127.0.0.1:58290->127.0.0.1:53: read: connection refused
2020-03-21T19:43:52.320Z        WARN    could not resolve github.com (try 10 of 10): lookup github.com on 10.96.0.10:53: read udp 127.0.0.1:43110->127.0.0.1:53: read: connection refused
2020-03-21T19:43:52.820Z        ERROR   Unbound does not seem to be working after 10 tries

my deployment:

apiVersion: v1
kind: Service
metadata:
  name: cloudflare-dns
spec:
  selector:
    app: cloudflare-dns
  type: LoadBalancer
  ports:
  - name: udpdns
    protocol: UDP
    port: 53
    targetPort: 53
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cloudflare-dns
  namespace: default
  annotations:
    flux.weave.works/automated: 'false'
spec:
  replicas: 2
  selector:
    matchLabels:
      app: cloudflare-dns
  template:
    metadata:
      labels:
        app: cloudflare-dns
    spec:
      nodeSelector:
        kubernetes.io/hostname: "homelab-a"
      containers:
      - name: cloudflare-dns
        image: qmcgaw/cloudflare-dns-server:latest
        env:
        - name: PUID
          value: '1001'
        - name: PGID
          value: '999'
        - name: TZ
          value: 'America/Los_Angeles'
        - name: PROVIDERS
          value: 'cloudflare'
        - name: VERBOSITY
          value: '3'
        resources:
          requests:
            memory: "16Mi"
            cpu: "128m"
          limits:
            memory: "128Mi"
            cpu: "500m"
        ports:
        - name: udpdns
          containerPort: 53
          protocol: UDP
        readinessProbe:
          tcpSocket:
            port: udpdns
          initialDelaySeconds: 5
          periodSeconds: 10
        livenessProbe:
          tcpSocket:
            port: udpdns
          initialDelaySeconds: 15
          periodSeconds: 20

is there a way to change the containers dns ip?

evanrich commented 4 years ago

fwiw, when i change to ":shell" image, it works, so it seems theres a problem with the go image.

qdm12 commented 4 years ago

It seems to be a firewall issue blocking access from 127.0.0.1:43110->127.0.0.1:53 (so probably loopback) in your k8s configuration.

The Go version fails as I implemented this final check to make sure Unbound is up and working. I can change it to a warning instead of a fatal error, but you might want to check your firewall configuration first.

Thanks

evanrich commented 4 years ago

127.0.0.1 should be within the container.. same as a physical host, so no firewall involved... also why does it work with the "shell" image? isn't it still talking from 43110->:53? 10.90.0.10 is the ip for k8s cluster dns, dont think thats it.

Basically, whats the difference between the :shell and :latest images? do they not function the same way, except for the "latest" being written entirely in go? There's a possibility it's related to the networkpolicy in k8s, but I have 53/udp allowed in, and that's an ingress policy to the pod/namespace. I'll have to poke around a little, but it would be helpful to know from a configuration/launch point of view whats different between the two.

FWIW, here's some additional deatils:

networkpolicy:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: cloudflare-dns
  namespace: default
spec:
  podSelector:
    matchLabels:
      app: cloudflare-dns
  ingress:
  - ports:
    - port: 53
      protocol: UDP
    from: []

this allows udp/53 in to the namespace for the container. In a normal container, it's resolv.conf gets set to 10.96.0.10, which is the cluster wide dns ip (coredns). It seems like your unbound tries to talk to itself (via 127.0.0.1), but if it does the same thing in the shell version as the go version, then it should work the same. The only thing i can think iof is maybe leaving out 127.0.0.1/8, and 10.0.0.0/8 to see if it's a rebinding thing? again, if one container works and the other doesnt, I don't think that's it..

pods resovl.conf

/unbound $ cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

Anyway I'll keep trying to debug, but would be curious to know whats different between the two images.

Thanks!

evanrich commented 4 years ago

one interesting thing i just found...if i kick the go container and then do a cat of the /unbound/unbound.conf, it looks way different... 11k and contains stuff like


  private-address: 99.47.56.138
  private-address: 99.48.169.38
  private-address: 99.48.176.109
  private-address: 99.54.13.169
  private-address: 99.54.134.182
  private-address: 99.56.101.228
  private-address: 99.56.152.232
  private-address: 99.57.28.74
  private-address: 99.61.60.117
  private-address: 99.64.183.238
  private-address: 99.67.57.104
  private-address: 99.69.20.150
  private-address: 99.73.130.192
  private-address: 99.73.67.33
  private-address: 99.73.73.31
  private-address: 99.75.133.106
  private-address: 99.76.33.238
  private-address: 99.78.120.186
  private-address: 99.79.135.85
  private-address: 99.79.40.250
  private-address: 99.79.53.104
  private-address: 99.81.40.78
  private-address: 99.87.209.147
  private-address: 99.89.34.39
  private-address: 99.90.6.164
  private-address: 99.91.12.209
  private-address: 99.91.161.234
  private-address: 99.92.171.204
  private-address: 99.95.49.130
  private-address: 99.96.30.196
  private-address: 99.97.25.108
  private-address: 99.98.217.243
  private-address: 99.98.77.107
  private-address: 99.99.139.67
  private-address: ::1/128
  private-address: ::ffff:0:0/96
  private-address: fc00::/7
  private-address: fe80::/10
forward-zone:
  forward-no-cache: yes
  forward-tls-upstream: yes
  name: "."
  forward-addr: 1.1.1.1@853#cloudflare-dns.com
  forward-addr: 1.0.0.1@853#cloudflare-dns.com
  forward-addr: 2606:4700:4700::1111@853#cloudflare-dns.com
  forward-addr: 2606:4700:4700::1001@853#cloudflare-dns.com
  forward-addr: 8.8.8.8@853#dns.google
  forward-addr: 8.8.4.4@853#dns.google
  forward-addr: 2001:4860:4860::8888@853#dns.google
  forward-addr: 2001:4860:4860::8844@853#dns.google
  forward-addr: 9.9.9.9@853#dns.quad9.net
  forward-addr: 149.112.112.112@853#dns.quad9.net
  forward-addr: 2620:fe::fe@853#dns.quad9.net
/unbound $ ls -la
total 12832
drwx------    1 1000     root          4096 Mar 22 20:41 .
drwxr-xr-x    1 root     root          4096 Mar 22 20:41 ..
-r--------    1 1000     root        215579 Mar 20 01:32 ca-certificates.crt
-r--------    1 1000     root             0 Mar 20 01:32 include.conf
-rw-------    1 1000     root          3316 Mar 22 20:41 root.hints
-rw-------    1 1000     root           165 Mar 22 20:41 root.key
-r-x------    1 1000     root        890568 Feb 20 21:26 unbound
-rw-------    1 1000     root      12009642 Mar 22 20:41 unbound.conf

if i look at the :shell one

/unbound $ ls -la
total 8880
drwx------    1 nonrootu root          4096 Mar 22 20:45 .
drwxr-xr-x    1 root     root          4096 Mar 22 20:45 ..
-rw-r--r--    1 nonrootu nonrootu    891880 Feb  8 23:39 blocks-malicious.bz2
-rw-r--r--    1 nonrootu nonrootu   7023874 Mar 22 20:45 blocks-malicious.conf
-rw-r--r--    1 nonrootu nonrootu     44899 Feb  8 23:39 blocks-nsa.bz2
-r--------    1 nonrootu root        232933 Feb  8 23:38 ca-certificates.crt
-r-x------    1 nonrootu nonrootu      6236 Feb  8 23:37 entrypoint.sh
-rw-r--r--    1 nonrootu nonrootu         0 Mar 22 20:45 include.conf
-rw-r--r--    1 nonrootu nonrootu      3316 Feb  8 23:38 root.hints
-rw-r--r--    1 nonrootu nonrootu       165 Feb  8 23:38 root.key
-r-x------    1 nonrootu root        865992 Dec 12 14:13 unbound
-rw-rw-r--    1 nonrootu nonrootu      1564 Mar 22 20:45 unbound.conf

unbound.conf...

  # security
  tls-cert-bundle: "ca-certificates.crt"
  root-hints: "root.hints"
  trust-anchor-file: "root.key"
  harden-below-nxdomain: yes
  harden-referral-path: yes
  harden-algo-downgrade: yes
  # set above to no if there is any problem
  # Prevent DNS rebinding
  private-address: ::ffff:0:0/96
  private-address: fe80::/10
  private-address: fc00::/7
  private-address: ::1/128
  private-address: 169.254.0.0/16
  private-address: 192.168.2.0/24
  private-address: 172.16.0.0/12
  private-address: 10.0.0.0/8
  private-address: 127.0.0.1/8

  # network
  do-ip4: yes
  do-ip6: no
  interface: 0.0.0.0
  port: 53
  access-control: 0.0.0.0/0 allow

  # system
  username: ""

  # other files
  include: "blocks-malicious.conf"
  include: "include.conf"
forward-zone:
  name: "."
  forward-tls-upstream: yes
  forward-no-cache: no
  forward-addr: 1.1.1.1@853#cloudflare-dns.com
  forward-addr: 1.0.0.1@853#cloudflare-dns.com
  forward-addr: 8.8.8.8@853#dns.google
  forward-addr: 8.8.4.4@853#dns.google

so there is something different. Regardless, lets close this, for some reason the :shell version works with 128MB of ram as a resource limit, the :latest one required me to bump to 256MB, but now it works.

qdm12 commented 4 years ago

Hello, sorry for the late reply.

The only difference as mentioned before is the final check trying to resolve github.com when unbound has been launched.

The Go version also generates the configuration file from scratch, while the shell one will extract some files in another conf file included in the main unbound conf file. That does not make a difference.

Maybe the latest one (Go) has blocking enabled by default, and that usually consumes 50-100MB of extra memory. You can however disable it with the environment variables.

I'll make the final check optional with an env variable so you can check if it works without that check. From the error message (connection refused) it seems like a firewall issue to me (as it works out of k8s as well).

And I'm also pretty sure it will work fine as is, it's just the check causing an issue for whatever firewall reason (I don't blame you haha, firewall can be a nightmare)

qdm12 commented 4 years ago

I've added a CHECK_UNBOUND variable that you can set to off to check if it works out.

stumpylog commented 4 years ago

Small problem with the CHECK_UNBOUND. It looks to default to "yes", but is expected to be "on" or "off"

qdm12 commented 4 years ago

Does it work without the check now? Thanks!

stumpylog commented 4 years ago

Looks good with the latest image. Thanks!