coredns OOMKilled when using this plugin

ziollek / gathersrv

CoreDNS plugin to gather DNS responses with SRV records from several domains and hide them behind a single domain

Apache License 2.0

5 stars 1 forks source link

coredns OOMKilled when using this plugin #14

Open raffaelespazzoli opened 5 months ago

raffaelespazzoli commented 5 months ago

hello, I am trying to use this plugin, but my coredns pods get OOMKilled. I am probably mis-configuring it, possibly creating a loop... I'd like someone to review my config and possibly help me troubleshoot.

I have three clusters each with a modified coredns config. This is the config, this is one of them as an example:

    .:53 {
        errors
        health {
           lameduck 5s
        }
        ready
        rewrite name substring cluster.cluster1 cluster.local
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
           ttl 30
        }
        prometheus :9153
        forward . /etc/resolv.conf {
           max_concurrent 1000
        }
        cache 30
        loop
        reload
        loadbalance
    }

    cluster.cluster2:53 {
        rewrite name substring cluster.cluster2 cluster.local

        forward . ${cluster2_coredns_ip}:53 {
            expire 10s
            policy round_robin
        }
        cache 10
    }

    cluster.cluster3:53 {
        rewrite name substring cluster.cluster3 cluster.local

        forward . ${cluster3_coredns_ip}:53 {
            expire 10s
            policy round_robin
        }
        cache 10
    }

    cluster.all:53 {
      gathersrv cluster.all. {
          cluster.cluster1. c1-
          cluster.cluster2. c2-
          cluster.cluster3. c3-
      }
      forward . 127.0.0.1:53
    }

so cluster.local is the local cluster, cluster.cluster[1..3] is rewritten as cluster.local and forwarded to the pertinent coredns. Finally cluster.all should gather srv records from all of the clusters.

pointing to cluster1 coredns IP, I can resolve _peers._tcp.etcd-headless.h2.svc.cluster.local:

 dig @10.89.0.225 -t SRV _peers._tcp.etcd-headless.h2.svc.cluster.local

; <<>> DiG 9.18.24 <<>> @10.89.0.225 -t SRV _peers._tcp.etcd-headless.h2.svc.cluster.local
; (1 server found)
;; global options: +cmd
;; Got answer:
;; WARNING: .local is reserved for Multicast DNS
;; You are currently testing what happens when an mDNS query is leaked to DNS
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 27603
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 2
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: f47ee6beb5803a4b (echoed)
;; QUESTION SECTION:
;_peers._tcp.etcd-headless.h2.svc.cluster.local.    IN SRV

;; ANSWER SECTION:
_peers._tcp.etcd-headless.h2.svc.cluster.local. 30 IN SRV 0 100 2379 etcd-headless.h2.svc.cluster.local.

;; ADDITIONAL SECTION:
etcd-headless.h2.svc.cluster.local. 30 IN A 10.96.0.42

;; Query time: 5 msec
;; SERVER: 10.89.0.225#53(10.89.0.225) (UDP)
;; WHEN: Tue Apr 02 12:48:12 EDT 2024
;; MSG SIZE  rcvd: 237

and resolve _peers._tcp.etcd-headless.h2.svc.cluster.cluster1:

dig @10.89.0.225 -t SRV _peers._tcp.etcd-headless.h2.svc.cluster.cluster1

; <<>> DiG 9.18.24 <<>> @10.89.0.225 -t SRV _peers._tcp.etcd-headless.h2.svc.cluster.cluster1
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 3306
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 2
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: e8816b2d226c93c1 (echoed)
;; QUESTION SECTION:
;_peers._tcp.etcd-headless.h2.svc.cluster.cluster1. IN SRV

;; ANSWER SECTION:
_peers._tcp.etcd-headless.h2.svc.cluster.local. 30 IN SRV 0 100 2379 etcd-headless.h2.svc.cluster.local.

;; ADDITIONAL SECTION:
etcd-headless.h2.svc.cluster.local. 30 IN A 10.96.0.42

;; Query time: 2 msec
;; SERVER: 10.89.0.225#53(10.89.0.225) (UDP)
;; WHEN: Tue Apr 02 12:49:10 EDT 2024
;; MSG SIZE  rcvd: 240

which result in the same response, correctly so. I can also try with cluster2:

dig @10.89.0.225 -t SRV _peers._tcp.etcd-headless.h2.svc.cluster.cluster2

; <<>> DiG 9.18.24 <<>> @10.89.0.225 -t SRV _peers._tcp.etcd-headless.h2.svc.cluster.cluster2
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42009
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 2
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: d5ce2f5ab20abd3b (echoed)
;; QUESTION SECTION:
;_peers._tcp.etcd-headless.h2.svc.cluster.cluster2. IN SRV

;; ANSWER SECTION:
_peers._tcp.etcd-headless.h2.svc.cluster.local. 10 IN SRV 0 100 2379 etcd-headless.h2.svc.cluster.local.

;; ADDITIONAL SECTION:
etcd-headless.h2.svc.cluster.local. 10 IN A 10.96.1.114

;; Query time: 9 msec
;; SERVER: 10.89.0.225#53(10.89.0.225) (UDP)
;; WHEN: Tue Apr 02 12:50:49 EDT 2024
;; MSG SIZE  rcvd: 240

which still works but it is resolved to a different IP. however if I try cluster.all:

dig @10.89.0.225 -t SRV _peers._tcp.etcd-headless.h2.svc.cluster.all
;; communications error to 10.89.0.225#53: timed out

I get a timeout and generate an OOMKilled for the coredns pod.

ziollek commented 5 months ago

Thanks for reaching me, I will try to reproduce the root cause of this issue. However, if you could provide CoreDNS logs before crash - it would be very valuable to me. Nevertheless, I can give piece of advice regarding your configuration. The rewrite configuration that you made causes that CoreDNS returns answers inconsistent to original questions (and I suppose that such behavior does not adhere RFC) - please take a look on rewriting response.

To put it simply, instead such ANSWER section in response:

dig @10.89.0.225 -t SRV _peers._tcp.etcd-headless.h2.svc.cluster.cluster1

...

;; ANSWER SECTION:
_peers._tcp.etcd-headless.h2.svc.cluster.local. 30 IN SRV 0 100 2379 etcd-headless.h2.svc.cluster.local.

you should received something like that:

dig @10.89.0.225 -t SRV _peers._tcp.etcd-headless.h2.svc.cluster.cluster1

...

;; ANSWER SECTION:
_peers._tcp.etcd-headless.h2.svc.cluster.cluster1.  30 IN SRV 0 100 2379 etcd-headless.h2.svc.cluster.cluster1.

My suggestion is to use below configuration snippet for rewrite plugin for each cluster[1-3] zones - below example is for cluster1

rewrite stop {
        name suffix .cluster.cluster1 .cluster.local answer auto
}

raffaelespazzoli commented 5 months ago

good call. I made those changes. still getting the same behavior. so now I'm getting something like this:

dig @10.89.0.225 -t SRV _peers._tcp.etcd-headless.h2.svc.cluster.cluster2

; <<>> DiG 9.18.24 <<>> @10.89.0.225 -t SRV _peers._tcp.etcd-headless.h2.svc.cluster.cluster2
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 4600
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 2
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: d1c25b7278f911e0 (echoed)
;; QUESTION SECTION:
;_peers._tcp.etcd-headless.h2.svc.cluster.cluster2. IN SRV

;; ANSWER SECTION:
_peers._tcp.etcd-headless.h2.svc.cluster.cluster2. 10 IN SRV 0 100 2379 etcd-headless.h2.svc.cluster.cluster2.

;; ADDITIONAL SECTION:
etcd-headless.h2.svc.cluster.cluster2. 10 IN A  10.96.1.114

;; Query time: 3 msec
;; SERVER: 10.89.0.225#53(10.89.0.225) (UDP)
;; WHEN: Tue Apr 02 13:44:24 EDT 2024
;; MSG SIZE  rcvd: 249

ziollek commented 5 months ago

I have prepared the minimal standalone config to reproduce the problem on local docker:

.:{$DNS_PORT} {
    log . "catch-all logger: {remote}:{port} - {>id} {type} {class} {name} {proto} {size} {>do} {>bufsize} {rcode} {>rflags} {rsize} {duration}"
    template IN SRV local {
        match (_[^.]+\.)*(?P<record>.*)$
        answer "{{ .Name }} 10 IN SRV 0 100 2379 {{ .Group.record }}"
        fallthrough
    }
}

cluster.cluster1:{$DNS_PORT} {
        log . "cluster1 logger: {remote}:{port} - {>id} {type} {class} {name} {proto} {size} {>do} {>bufsize} {rcode} {>rflags} {rsize} {duration}"
        rewrite stop {
                name suffix .cluster.cluster1 .cluster.local answer auto
        }
        forward . 127.0.0.1:{$DNS_PORT}
}

cluster.cluster2:{$DNS_PORT} {
    log . "cluster2 logger: {remote}:{port} - {>id} {type} {class} {name} {proto} {size} {>do} {>bufsize} {rcode} {>rflags} {rsize} {duration}"
    rewrite stop {
            name suffix .cluster.cluster2 .cluster.local answer auto
    }
    forward . 127.0.0.1:{$DNS_PORT}
}

cluster.cluster3:{$DNS_PORT} {
    log . "cluster3 logger: {remote}:{port} - {>id} {type} {class} {name} {proto} {size} {>do} {>bufsize} {rcode} {>rflags} {rsize} {duration}"
    rewrite stop {
            name suffix .cluster.cluster3 .cluster.local answer auto
    }
    forward . 127.0.0.1:{$DNS_PORT}
}

cluster.all:{$DNS_PORT} {
  gathersrv cluster.all. {
      cluster.cluster1. c1-
      cluster.cluster2. c2-
      cluster.cluster3. c3-
  }
  log . "sub-query logger: {remote}:{port} - {>id} {type} {class} {name} {proto} {size} {>do} {>bufsize} {rcode} {>rflags} {rsize} {duration}"
  forward . 127.0.0.1:{$DNS_PORT}
}

When I ask for a SRV record in zone cluster.all I get all three records as a result:

dig  -t SRV _peers._tcp.etcd-headless.h2.svc.cluster.all -p5300 @127.0.0.1

; <<>> DiG 9.18.20 <<>> -t SRV _peers._tcp.etcd-headless.h2.svc.cluster.all -p5300 @127.0.0.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 26628
;; flags: qr aa rd; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 355c70260c63fcfe (echoed)
;; QUESTION SECTION:
;_peers._tcp.etcd-headless.h2.svc.cluster.all. IN SRV

;; ANSWER SECTION:
_peers._tcp.etcd-headless.h2.svc.cluster.all. 10 IN SRV 0 100 2379 c3-etcd-headless.h2.svc.cluster.all.
_peers._tcp.etcd-headless.h2.svc.cluster.all. 10 IN SRV 0 100 2379 c1-etcd-headless.h2.svc.cluster.all.
_peers._tcp.etcd-headless.h2.svc.cluster.all. 10 IN SRV 0 100 2379 c2-etcd-headless.h2.svc.cluster.all.

;; Query time: 2 msec
;; SERVER: 127.0.0.1#5300(127.0.0.1) (UDP)
;; WHEN: Tue Apr 02 20:37:03 CEST 2024
;; MSG SIZE  rcvd: 382

In my snippet, there are additional logs that show what requests were handled by each zone - so the output from CoreDNS is as follows:

.:5300
cluster.all.:5300
cluster.cluster1.:5300
cluster.cluster2.:5300
cluster.cluster3.:5300
CoreDNS-1.11.1
linux/amd64, go1.21.3, v1.11.1
[INFO] catch-all logger: 127.0.0.1:33758 - 7774 SRV IN _peers._tcp.etcd-headless.h2.svc.cluster.local. udp 87 false 1232 NOERROR qr,aa,rd 164 0.000226256s
[INFO] catch-all logger: 127.0.0.1:58998 - 2692 SRV IN _peers._tcp.etcd-headless.h2.svc.cluster.local. udp 87 false 1232 NOERROR qr,aa,rd 164 0.000373521s
[INFO] cluster3 logger: 127.0.0.1:54775 - 35088 SRV IN _peers._tcp.etcd-headless.h2.svc.cluster.cluster3. udp 87 false 1232 NOERROR qr,aa,rd 196 0.000551974s
[INFO] cluster1 logger: 127.0.0.1:54817 - 53917 SRV IN _peers._tcp.etcd-headless.h2.svc.cluster.cluster1. udp 87 false 1232 NOERROR qr,aa,rd 196 0.00057642s
[INFO] catch-all logger: 127.0.0.1:43712 - 44374 SRV IN _peers._tcp.etcd-headless.h2.svc.cluster.local. udp 87 false 1232 NOERROR qr,aa,rd 164 0.000238325s
[INFO] sub-query logger: 192.168.65.1:42032 - 26628 SRV IN _peers._tcp.etcd-headless.h2.svc.cluster.cluster3. udp 90 false 1232 NOERROR qr,aa,rd 196 0.000767918s
[INFO] cluster2 logger: 127.0.0.1:44086 - 55398 SRV IN _peers._tcp.etcd-headless.h2.svc.cluster.cluster2. udp 87 false 1232 NOERROR qr,aa,rd 196 0.00054577s
[INFO] sub-query logger: 192.168.65.1:42032 - 26628 SRV IN _peers._tcp.etcd-headless.h2.svc.cluster.cluster1. udp 90 false 1232 NOERROR qr,aa,rd 196 0.000871329s
[INFO] sub-query logger: 192.168.65.1:42032 - 26628 SRV IN _peers._tcp.etcd-headless.h2.svc.cluster.cluster2. udp 90 false 1232 NOERROR qr,aa,rd 196 0.000836313s
[INFO] type=SRV, question=_peers._tcp.etcd-headless.h2.svc.cluster.all., response=;; opcode: QUERY, status: NOERROR, id: 26628, answer-records=3, extra-records=1, gathered=3, not-gatherer=0, duration=1.239167ms

If it is not a problem please add more verbose logging especially for zone cluster.all.
How did you build CoreDNS? Plugins in CoreDNS are processed in the order defined during compilations - so it is crucial to set it properly

raffaelespazzoli commented 5 months ago

I put the plugin at the end of the list. retrying with the right order....

raffaelespazzoli commented 5 months ago

no changes, I am still getting the error. Enabling the logs, I see that the pods enter in an infinite loop:

[INFO] catch-all logger: 127.0.0.1:36210 - 63423 NS IN . udp 17 false 512 NXDOMAIN qr,rd,ra 17 0.00097998s
[INFO] catch-all logger: 127.0.0.1:41054 - 57808 NS IN . udp 17 false 512 NXDOMAIN qr,rd,ra 17 0.000907888s
[INFO] catch-all logger: 127.0.0.1:42035 - 41998 NS IN . udp 17 false 512 NXDOMAIN qr,rd,ra 17 0.00057028s
[INFO] catch-all logger: 127.0.0.1:46771 - 34200 NS IN . udp 17 false 512 NXDOMAIN qr,rd,ra 17 0.000373977s
[INFO] catch-all logger: 127.0.0.1:58456 - 25621 NS IN . udp 17 false 512 NXDOMAIN qr,rd,ra 17 0.000408552s
[INFO] catch-all logger: 127.0.0.1:39111 - 18325 NS IN . udp 17 false 512 NXDOMAIN qr,rd,ra 17 0.000520405s
[INFO] catch-all logger: 127.0.0.1:36067 - 37192 NS IN . udp 17 false 512 NXDOMAIN qr,rd,ra 17 0.000569562s
[INFO] catch-all logger: 127.0.0.1:57640 - 43318 NS IN . udp 17 false 512 NXDOMAIN qr,rd,ra 17 0.000514316s
[INFO] catch-all logger: 127.0.0.1:51773 - 7138 NS IN . udp 17 false 512 NXDOMAIN qr,rd,ra 17 0.000478651s

This is my config now:

    .:53 {
        log . "catch-all logger: {remote}:{port} - {>id} {type} {class} {name} {proto} {size} {>do} {>bufsize} {rcode} {>rflags} {rsize} {duration}"
        errors
        health {
           lameduck 5s
        }
        ready
        rewrite name suffix .cluster.cluster1 .cluster.local answer auto
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
           ttl 30
        }
        prometheus :9153
        forward . /etc/resolv.conf {
           max_concurrent 1000
        }
        cache 30
        loop
        reload
        loadbalance
    }

    cluster.cluster2:53 {
        log . "cluster2 logger: {remote}:{port} - {>id} {type} {class} {name} {proto} {size} {>do} {>bufsize} {rcode} {>rflags} {rsize} {duration}"
        rewrite name suffix .cluster.cluster2 .cluster.local answer auto

        forward . ${cluster2_coredns_ip}:53 {
            expire 10s
            policy round_robin
        }
        cache 10
    }

    cluster.cluster3:53 {
        log . "cluster3 logger: {remote}:{port} - {>id} {type} {class} {name} {proto} {size} {>do} {>bufsize} {rcode} {>rflags} {rsize} {duration}"
        rewrite name suffix .cluster.cluster3 .cluster.local answer auto

        forward . ${cluster3_coredns_ip}:53 {
            expire 10s
            policy round_robin
        }
        cache 10
    }

    cluster.all:53 {
      gathersrv cluster.all. {
          cluster.cluster1. c1-
          cluster.cluster2. c2-
          cluster.cluster3. c3-
      }
      log . "sub-query logger: {remote}:{port} - {>id} {type} {class} {name} {proto} {size} {>do} {>bufsize} {rcode} {>rflags} {rsize} {duration}"
      forward . 127.0.0.1:53
    }

ziollek commented 5 months ago

It seems that you still have the wrong order of plugins in your binary (if you use CoreDNS Makefile to build binary make sure to clean the environment before rebuilding CoreDNS: make clean).

Please filter logs generated by cluster.all zone, it means logs with sub-query logger prefix. Such logs should contain queries to sub-zones: SRV IN _peers._tcp.etcd-headless.h2.svc.cluster.cluster[1-3].. If you notice the query to the cluster.all zone it indicates that gathersrv plugin was triggered too late.

raffaelespazzoli commented 5 months ago

I recompiled from a clean workspace. now I am not getting the infinite loop anymore. The pod just dies. This is what I see in the logs

.:53
cluster.all.:53
cluster.cluster2.:53
cluster.cluster3.:53
[INFO] plugin/reload: Running configuration SHA512 = 7201ab82faa86d13333e01a35249e80eb96ffa6a71e98d0529443f52a02a584a215e95297d90d6c31983f0344b9c503b24de6222464b2fb1abe6104b3e5dac3c
CoreDNS-1.11.2
linux/arm64, go1.21.8, e3f83cb1f-dirty
[INFO] catch-all logger: 127.0.0.1:42128 - 19759 HINFO IN 6028198551586158430.7082859719790006129. udp 57 false 512 NXDOMAIN qr,rd,ra 132 0.047698373s

that catch-all logger line is there before I run the query.

I gave the pod a bit more memory and it started with the loop again until it dies... here is the log with more memory:

.:53
cluster.all.:53
cluster.cluster2.:53
cluster.cluster3.:53
[INFO] plugin/reload: Running configuration SHA512 = 7201ab82faa86d13333e01a35249e80eb96ffa6a71e98d0529443f52a02a584a215e95297d90d6c31983f0344b9c503b24de6222464b2fb1abe6104b3e5dac3c
CoreDNS-1.11.2
linux/arm64, go1.21.8, e3f83cb1f-dirty
[INFO] catch-all logger: 127.0.0.1:60357 - 35183 HINFO IN 723505535130710133.1386775834635033436. udp 56 false 512 NXDOMAIN qr,rd,ra 131 0.048782517s
[INFO] catch-all logger: 10.89.0.1:51558 - 47673 SRV IN _peers._tcp.etcd-headless.h2.svc.cluster.cluster1. udp 90 false 1232 NOERROR qr,aa,rd 226 0.006197935s
[INFO] cluster2 logger: 10.89.0.1:33265 - 34423 SRV IN _peers._tcp.etcd-headless.h2.svc.cluster.cluster2. udp 90 false 1232 NOERROR qr,aa,rd 226 0.008317279s
[INFO] catch-all logger: 127.0.0.1:40103 - 20964 NS IN . udp 17 false 512 NXDOMAIN qr,rd,ra 17 0.000856663s
[INFO] catch-all logger: 127.0.0.1:48758 - 12068 NS IN . udp 17 false 512 NXDOMAIN qr,rd,ra 17 0.000814903s
[INFO] catch-all logger: 127.0.0.1:39548 - 29014 NS IN . udp 17 false 512 NXDOMAIN qr,rd,ra 17 0.000367905s
[INFO] catch-all logger: 127.0.0.1:53146 - 31204 NS IN . udp 17 false 512 NXDOMAIN qr,rd,ra 17 0.000374262s
[INFO] catch-all logger: 127.0.0.1:52731 - 1049 NS IN . udp 17 false 512 NXDOMAIN qr,rd,ra 17 0.000407602s
[INFO] catch-all logger: 127.0.0.1:46820 - 37443 NS IN . udp 17 false 512 NXDOMAIN qr,rd,ra 17 0.000438854s
[INFO] catch-all logger: 127.0.0.1:48116 - 44858 NS IN . udp 17 false 512 NXDOMAIN qr,rd,ra 17 0.000390648s
[INFO] catch-all logger: 127.0.0.1:43784 - 23307 NS IN . udp 17 false 512 NXDOMAIN qr,rd,ra 17 0.00040763s

as you can see I tried with .cluster.local and .cluster.cluster2 before doing .cluster.all which is where the loop started. again no sign of the sub-query logger.

that is probably because the log statement is after the gathersrv statement. That is where it was in your original example. Is that correct?

ziollek commented 5 months ago

catch-all logger line with HINFO is a query made by loop plugin while spinning up the server. Were you able to grep lines generated by cluster.all zone before the crash? you can use kubectl logs with --previous flag to examine the output of the container before crash.

If there are no such lines, it means that OOMkiller killed container earlier.

To eliminate the loop, you can also forward queries in cluster.all zone to the wrong port ie. 5353:

cluster.all:53 {
      gathersrv cluster.all. {
          cluster.cluster1. c1-
          cluster.cluster2. c2-
          cluster.cluster3. c3-
      }
      log . "sub-query logger: {remote}:{port} - {>id} {type} {class} {name} {proto} {size} {>do} {>bufsize} {rcode} {>rflags} {rsize} {duration}"
      forward . 127.0.0.1:5353
    }

it allows us to easier check what exactly happens.

raffaelespazzoli commented 5 months ago

what Is shared is all of the logs. it does not seem to ever print anything with sub-query logger, perhaps I wasn't clear on that. I'll try with the wrong port.

raffaelespazzoli commented 5 months ago

with the wrong port, I got this:

[INFO] Reloading complete
[INFO] sub-query logger: 10.89.0.1:41081 - 3290 SRV IN _peers._tcp.etcd-headless.h2.svc.cluster.all. udp 85 false 1232 - - 0 1.00269468s

and this answer:

; <<>> DiG 9.18.24 <<>> @10.89.0.225 -t SRV _peers._tcp.etcd-headless.h2.svc.cluster.all
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 3290
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: fbb6a1a411f95bc1 (echoed)
;; QUESTION SECTION:
;_peers._tcp.etcd-headless.h2.svc.cluster.all. IN SRV

;; Query time: 1005 msec
;; SERVER: 10.89.0.225#53(10.89.0.225) (UDP)
;; WHEN: Wed Apr 03 10:02:13 EDT 2024
;; MSG SIZE  rcvd: 85

ziollek commented 5 months ago

So, as I supposed - the problem is caused by the wrong order of plugins. Pls. verify:

if the built image contains properly built CoreDNS (even on local docker)
if the POD is spawned with the proper docker image

raffaelespazzoli commented 5 months ago

I don't know how to verify, is this what you are looking for:

kubectl --context kind-cluster1 exec -n kube-system coredns-598c664574-jpfct -- /coredns -plugins
Server types:
  dns

Caddyfile loaders:
  flag
  default

Other plugins:
  dns.acl
  dns.any
  dns.auto
  dns.autopath
  dns.azure
  dns.bind
  dns.bufsize
  dns.cache
  dns.cancel
  dns.chaos
  dns.clouddns
  dns.debug
  dns.dns64
  dns.dnssec
  dns.dnstap
  dns.erratic
  dns.errors
  dns.etcd
  dns.file
  dns.forward
  dns.gathersrv
  dns.geoip
  dns.grpc
  dns.header
  dns.health
  dns.hosts
  dns.k8s_external
  dns.kubernetes
  dns.loadbalance
  dns.local
  dns.log
  dns.loop
  dns.metadata
  dns.minimal
  dns.nsid
  dns.pprof
  dns.prometheus
  dns.ready
  dns.reload
  dns.rewrite
  dns.root
  dns.route53
  dns.secondary
  dns.sign
  dns.template
  dns.timeouts
  dns.tls
  dns.trace
  dns.transfer
  dns.tsig
  dns.view
  dns.whoami
  on

ziollek commented 5 months ago

unfortunately, it prints plugins in alphabetic order instead of processing order

ziollek commented 5 months ago

You wrote: I recompiled from a clean workspace. - what does it mean? Have you cloned CoreDNS source to a new directory? As I can see there is a problem that CoreDNS Makefile does not clean generated files that contain the order of plugins. It means you have to either:

forcibly regenerate them (make gen)
or clone the repository once again and make appropriate changes in plugin.cfg before building CoreDNS

I will prepare the repo with an automated building process

ziollek commented 5 months ago

Here you can find a simple repo that allows building docker with a properly configured order of plugins: https://github.com/ziollek/gathersrv-docker

You can also find there Corefile that allows testing the behavior locally.

raffaelespazzoli commented 5 months ago

it worked, it was the make gen step. I have two questions:

here is an example of what is returned:


;; ANSWER SECTION:
_peers._tcp.etcd-headless.h2.svc.cluster.all. 26 IN SRV 0 100 2379 c2-etcd-headless.h2.svc.cluster.all.
_peers._tcp.etcd-headless.h2.svc.cluster.all. 10 IN SRV 0 100 2379 c1-etcd-headless.h2.svc.cluster.all.
_peers._tcp.etcd-headless.h2.svc.cluster.all. 10 IN SRV 0 100 2379 c3-etcd-headless.h2.svc.cluster.all.

;; ADDITIONAL SECTION: c2-etcd-headless.h2.svc.cluster.all. 26 IN A 10.96.1.107 c1-etcd-headless.h2.svc.cluster.all. 10 IN A 10.96.0.234 c3-etcd-headless.h2.svc.cluster.all. 10 IN A 10.96.2.74


now `c2-etcd-headless.h2.svc.cluster.all` cannot actually be resolved. Isn't that going to be a problem?

2. I find this aggregating-result plugin very useful, why stop at SRV record and not support any record type?

ziollek commented 5 months ago

Ad 1. The A, AAAA records either for c2-etcd-headless.h2.svc.cluster.all and for etcd-headless.h2.svc.cluster.all should be resolved as well. Have you tried:

dig c2-etcd-headless.h2.svc.cluster.all @your-coredns-ip

Ad 2. As mentioned above it supports SRV, A, AAAA in contrast to k8s multicluster DNS which is much more complicated to set up and supports only A, AAAA. But indeed, I see the lack of such information in README

I have added an example of resolving hostnames returned by SRV query to the previously prepared demo - it needs to rebuild the image because of changes in Corefile.

raffaelespazzoli commented 5 months ago

Ad 1. I tried and it works, thanks. I didn't understand this feature. So when configuring these stateful workloads and referrign to my example, should one use the c2-etcd-headless.h2.svc.cluster.all or the etcd-0.etcd-headless.h2.svc.cluster.cluster2 notation? if I had two instances of etcd per cluster, how would the first notation work?

Ad 2. contrast to k8s multicluster DNS which is much more complicated to set up what are you referring to here? The MCS specification?

ziollek commented 5 months ago

Ad 1. To be honest, I do not understand why your headless service does not point to the particular node. I am referring to your first comment where you pasted:

 dig @10.89.0.225 -t SRV _peers._tcp.etcd-headless.h2.svc.cluster.local
...

;; ANSWER SECTION:
_peers._tcp.etcd-headless.h2.svc.cluster.local. 30 IN SRV 0 100 2379 etcd-headless.h2.svc.cluster.local.

It should be resolved to etcd-0.etcd-headless.h2.svc.cluster.local instead of etcd-headless.h2.svc.cluster.local.

Are you sure that you configured a headless service - it looks rather like ClusterIP.

Ad 2. I am referring to the current implementation in GKE

MCS only supports ClusterSetIP and headless Services. Only DNS "A" records are available.