Closed cubic3d closed 3 years ago
Can you add a few examples of current vs expected behavior?
Can you add a few examples of current vs expected behavior?
Of course! In the following queries I assume an Ingress resource defining a host with domain.tld
and the Corefile from above:
.:1053 {
...
k8s_gateway domain.tld {
kubeconfig kubeconfig
fallthrough
}
forward . tls://1.1.1.1 tls://1.0.0.1 tls://2606:4700:4700::1111 tls://2606:4700:4700::1001 {
tls_servername cloudflare-dns.com
}
}
Current behaviour (not resolving and sending SOA):
; <<>> DiG 9.16.18 <<>> @localhost -p 1053 domain.tld
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 41923
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: efe621840a495f3f (echoed)
;; QUESTION SECTION:
;domain.tld. IN A
;; AUTHORITY SECTION:
domain.tld. 60 IN SOA dns1.kube-system.domain.tld. hostmaster.dns1.kube-system.domain.tld. 12345 7200 1800 86400 60
;; Query time: 3 msec
;; SERVER: 127.0.0.1#1053(127.0.0.1)
;; WHEN: Mon Jul 05 22:19:19 CEST 2021
;; MSG SIZE rcvd: 162
Expected (resolving query):
; <<>> DiG 9.16.18 <<>> @localhost -p 1053 domain.tld
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45722
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 966178b144c51d99 (echoed)
;; QUESTION SECTION:
;domain.tld. IN A
;; ANSWER SECTION:
domain.tld. 60 IN A 192.168.99.4
;; Query time: 0 msec
;; SERVER: 127.0.0.1#1053(127.0.0.1)
;; WHEN: Mon Jul 05 22:23:17 CEST 2021
;; MSG SIZE rcvd: 87
In the following examples there is no resource containing a host for domain.tld
:
Current behaviour (not finding a resource and responding with SOA):
; <<>> DiG 9.16.18 <<>> @localhost -p 1053 domain.tld
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35567
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: d3c04a2768259411 (echoed)
;; QUESTION SECTION:
;domain.tld. IN A
;; AUTHORITY SECTION:
domain.tld. 60 IN SOA dns1.kube-system.domain.tld. hostmaster.dns1.kube-system.domain.tld. 12345 7200 1800 86400 60
;; Query time: 0 msec
;; SERVER: 127.0.0.1#1053(127.0.0.1)
;; WHEN: Mon Jul 05 22:28:39 CEST 2021
;; MSG SIZE rcvd: 162
Expected (not finding a resource and falling through):
; <<>> DiG 9.16.18 <<>> @localhost -p 1053 domain.tld
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 18948
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; PAD: (350 bytes)
;; QUESTION SECTION:
;domain.tld. IN A
;; AUTHORITY SECTION:
. 85891 IN SOA a.root-servers.net. nstld.verisign-grs.com. 2021070501 1800 900 604800 86400
;; Query time: 206 msec
;; SERVER: 127.0.0.1#1053(127.0.0.1)
;; WHEN: Mon Jul 05 22:29:27 CEST 2021
;; MSG SIZE rcvd: 468
ok, so it only happens for queries matching the zone root and it is not a problem for foo.domain.tld
, correct?
and also wanted to confirm, when you said it's inconsistent with other plugins, which ones did you mean specifically?
I haven't tested it but looks like kubernetes
behaves in a similar way
https://github.com/coredns/coredns/blob/master/plugin/kubernetes/handler.go#L64
ok, so it only happens for queries matching the zone root and it is not a problem for
foo.domain.tld
, correct?
Yes, it's only about the zone's root that the plugin receives.
and also wanted to confirm, when you said it's inconsistent with other plugins, which ones did you mean specifically?
I didn't look specifically on kubernetes
(mostly local domains that are excluded in k8s from fallthrough) but at hosts
hosts /etc/hosts domain.tld {
fallthrough
}
which will fallthrough for a query for the root domain.tld
. I don't see the same behaviour in the referenced code.
To clarify: In the case of k8s_gateway there are two places where fallthrough is happening currently:
But not in https://github.com/ori-edge/k8s_gateway/blob/master/gateway.go#L122-L125 which always skips the other two if the query was made for the zone's root. The idea is to allow a query for the zone's root and not just to return SOA which automatically would lead to propper fallthrough or keep the current behaviour, but also allow fallthrough in this case if it's enabled on the plugin.
so k8s_gateway
should never return SOA in case fallthrough
is configured as an option, right?
so
k8s_gateway
should never return SOA in casefallthrough
is configured as an option, right?
Correct
ok, I think I got it. I'll try to cook something up in the next few days... unless you wanted to do a PR?
Sounds good! Up to you, I could but probably not in the next few days, maybe end of weekend or in the course of next week ;)
I'm very confused how to tackle that in the correct way. After testing on kubernetes
plugin on CoreDNS in-cluster it answers on every SOA query, regardless if it's a subdomain, existent or non-existent with SOA in answer section and never setting the authority bit.
; <<>> DiG 9.16.16 <<>> soa cluster.local
;; global options: +cmd
;; Got answer:
;; WARNING: .local is reserved for Multicast DNS
;; You are currently testing what happens when an mDNS query is leaked to DNS
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 21336
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 3dc581ebea467303 (echoed)
;; QUESTION SECTION:
;cluster.local. IN SOA
;; ANSWER SECTION:
cluster.local. 5 IN SOA ns.dns.cluster.local. hostmaster.cluster.local. 1626158767 7200 1800 86400 5
;; Query time: 0 msec
;; SERVER: 10.43.0.10#53(10.43.0.10)
;; WHEN: Wed Jul 14 14:54:26 UTC 2021
;; MSG SIZE rcvd: 147
; <<>> DiG 9.16.16 <<>> soa non-existent.cluster.local
;; global options: +cmd
;; Got answer:
;; WARNING: .local is reserved for Multicast DNS
;; You are currently testing what happens when an mDNS query is leaked to DNS
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 20813
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: d49502537dce6e18 (echoed)
;; QUESTION SECTION:
;non-existent.cluster.local. IN SOA
;; ANSWER SECTION:
cluster.local. 5 IN SOA ns.dns.cluster.local. hostmaster.cluster.local. 1626158767 7200 1800 86400 5
;; Query time: 0 msec
;; SERVER: 10.43.0.10#53(10.43.0.10)
;; WHEN: Wed Jul 14 14:55:46 UTC 2021
;; MSG SIZE rcvd: 160
There's an edge case: Fallthrough is on, query for SOA on root zone and root zone does match a resource (contradicts the above agreement on no SOA on fallthrough?)
I opened a PR with a proposal for the changes, tests are not adjusted yet.
kubernetes
responses above)Let me know if it does make any sense :wink:
There's an edge case: Fallthrough is on, query for SOA on root zone and root zone does match a resource (contradicts the above agreement on no SOA on fallthrough?)
I think if there's at least a single match, fallthrough should not be triggered
I'm very confused how to tackle that in the correct way. After testing on kubernetes plugin on CoreDNS in-cluster it answers on every SOA query, regardless if it's a subdomain, existent or non-existent with SOA in answer section and never setting the authority bit.
Authority bit aside, I think this is the expected behavior. Why do you find this strange?
I'm very confused how to tackle that in the correct way. After testing on kubernetes plugin on CoreDNS in-cluster it answers on every SOA query, regardless if it's a subdomain, existent or non-existent with SOA in answer section and never setting the authority bit.
Authority bit aside, I think this is the expected behavior. Why do you find this strange?
At this point I had probably too much playing with fall through and without - I have no idea what confused me there, yes this should be expected.
For the sake of search: This PR also unintentionally fixed a wrong RFC behaviour in case of fall through: If AAAA
was queried, the plugin would initiate a fall through regardless of existing A
records for the requested domain and possibly cause a mismatch cache if next plugin would return a negative response (NXDOMAIN
). As always: musl on Alpine manages to fail in this case, even if the resolver queries both AAAA
and A
.
Let's take this example
If querying for
domain.tld
the current implementation will callserveApex
in https://github.com/ori-edge/k8s_gateway/blob/master/gateway.go#L123 and returnSOA
orNS
entries. Iffallthrough
is declared it will won't handle it here and still return those two possible responses (inconsistent with other plugins).I built it skipping this case and handle the zone domain same as other subdomains, so having an Ingress defining
domain.tld
in its hosts works fine and also the defaultfallthrough
handler. I'm not sure why the response handler has an override for the zone domain. IfSOA
andNS
are required we could let pass any other request to the default handler to be able to resolve the zone domain or if it's not desired at least let the zone domain fall through.Any ideas? I'm happy to PR either way, but would prefer it could resolve the zone domain itself using k8s resources.