ori-edge / k8s_gateway

A CoreDNS plugin to resolve all types of external Kubernetes resources
Apache License 2.0
295 stars 64 forks source link

Bogous AAAA answer when there is no AAAA record, but there is A record for the same name #124

Closed plevart closed 2 years ago

plevart commented 2 years ago

Hi,

I have setup a special instance of CoreDNS with k8s_gateway plugin to serve a special zone dedicated to External IP addresses obtained from LoadBalancer services and Ingresses. The following is the Corefile config file:

.:1053 {
    errors

    log
    health {
        lameduck 5s
    }
    ready
    k8s_gateway "k8s-svc.marand.si" {
      apex extcoredns-k8s-gateway.extcoredns
      ttl 300
      resources Ingress Service
    }
    prometheus 0.0.0.0:9153
    loop
    reload
    loadbalance
}

I noticed strange behavior when resolving against a caching nameserver based on bind 9 that has this CoreDNS server configured as a forwarding zone target. Sometimes resolving would yield "Could not resolve host" for a name that should be present while sometimes it would resolve OK for the same name. Digging deeper I found that this CoredDNS/k8s_gateway combo does not return correct answers for AAAA queries when there is nothing to return for the requested name, but there is an A record for the same name. In such scenario DNS server should return status: NOERROR with an empty answer section. For example, a query against CoreDNS instance with kubernetes plugin for internal k8s names:

[root@fedora36-7b6f6895fb-sknsp /]# dig @10.254.0.10 kubernetes.default.svc.cluster.local AAAA

; <<>> DiG 9.16.30-RH <<>> @10.254.0.10 kubernetes.default.svc.cluster.local AAAA
;; Got answer:
;; You are currently testing what happens when an mDNS query is leaked to DNS
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 33814   <-- SEE status NOERROR
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
....

And here is a response for a query against CordeDNS instance with k8s_gateway plugin combo for a special zone and an existing name which has an A record but no AAAA record:

[peter@sun ~]$ dig @10.99.8.97 keycloak.k8s-svc.marand.si AAAA

; <<>> DiG 9.16.30-RH <<>> @10.99.8.97 keycloak.k8s-svc.marand.si AAAA
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 18719 <-- SEE staus: NXDOMAIN
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
...

I think this is exactly the problem that this document describes:

https://datatracker.ietf.org/doc/html/rfc4074

TLDR; Returning status NXDOMAIN (3) for an AAAA query means there are no records (of any type) for the requested name. Returning NOERROR (0) with an empty response for an AAAA query merely means there are no AAAA records for requested name but there are records of other types (such as A for example).

Linux resolver typically asks for A and AAAA records in two concurrent queries when resolving a name. Depending which answer comes 1st it may cache the A answer or it may cache the AAAA negative answer meaning that it may return "Cant resolve name". When a request for AAAA returns status NOERROR with empty response, such answer does not "override" positive answer to an A query and resolving always works correctly.

networkop commented 2 years ago

Great catch, thanks @plevart . Can you check if #125 behaves correctly with bind?

plevart commented 2 years ago

I can confirm that it now correctly works with Bind 9 in front of the CoreDNS/k8s_gateway. Thanks for quick fix.

Hell-Fire commented 2 years ago

I was hitting this same issue, issue-124 branch looks to fix it (unbound in front)! Thanks as well for the quick fix!

networkop commented 2 years ago

fix is in the latest v.0.3.1 release