hashicorp / consul-k8s

First-class support for Consul Service Mesh on Kubernetes
https://www.consul.io/docs/k8s
Mozilla Public License 2.0
667 stars 316 forks source link

sync-catalog: Consul to K8s: cannot reach k8s FQDN's #10

Closed GregHanson closed 5 years ago

GregHanson commented 5 years ago

I deployed version 0.2.0 of consul-helm charts with service sync enabled. I see the consul services in the kubectl get svc output but I can only curl them via their consul FQDN's (i.e. consul.service.consul) and not using the default k8s FQDN (consul.default.svc.cluster.local). The Consul to Kubernetes section from official consul blog here uses

dig consul.default.svc.cluster.local

Does dig behave any different from curl for consul DNS resolution? I have tried accessing the k8s FQDN's for consul services from inside other pods on IKS and locally with minikube and have not been able to get it to work.

mitchellh commented 5 years ago

Are you using CoreDNS instead of kube-dns? And did you setup the Consul DNS? This is currently a requirement as noted here: https://www.consul.io/docs/platform/k8s/service-sync.html#consul-to-kubernetes. CoreDNS is on its way to becoming the default for K8S, hence they aren't fixing the bug that forces this in kube-dns.

GregHanson commented 5 years ago

@mitchellh I was following the DNS steps listed here: https://www.consul.io/docs/platform/k8s/dns.html

I created the following configmap:

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  labels:
    addonmanager.kubernetes.io/mode: EnsureExists
  name: kube-dns
  namespace: kube-system
data:
  stubDomains: |
    {"consul": ["$(kubectl get svc consul-dns -o jsonpath='{.spec.clusterIP}')"]}
EOF

Was there another step needed ?

mitchellh commented 5 years ago

That's right, you also need to make sure you're using CoreDNS and not kube-dns. The major issue is documented here: https://github.com/kubernetes/dns/issues/131 Its a "wontfix" because CoreDNS fixes it and will be the default in Kubernetes soon.

ervikrant06 commented 5 years ago

@GregHanson If you want to perform dig against (consul.service.consul) then you need to use the consul DNS instead of K8 DNS.

/ # dig @172.17.0.11 -p 8600 consul.service.consul

; <<>> DiG 9.11.2-P1 <<>> @172.17.0.11 -p 8600 consul.service.consul
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9358
;; flags: qr aa rd; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 4
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;consul.service.consul.     IN  A

;; ANSWER SECTION:
consul.service.consul.  0   IN  A   172.17.0.13
consul.service.consul.  0   IN  A   172.17.0.11
consul.service.consul.  0   IN  A   172.17.0.12

;; ADDITIONAL SECTION:
consul.service.consul.  0   IN  TXT "consul-network-segment="
consul.service.consul.  0   IN  TXT "consul-network-segment="
consul.service.consul.  0   IN  TXT "consul-network-segment="

;; Query time: 15 msec
;; SERVER: 172.17.0.11#8600(172.17.0.11)
;; WHEN: Fri Oct 05 17:12:39 UTC 2018
;; MSG SIZE  rcvd: 206

dig using kubernetes DNS depends upon the name of service. In my case of consul service name was consul-server hence I used the following query.

/ # dig consul-server.default.svc.cluster.local

; <<>> DiG 9.11.2-P1 <<>> consul-server.default.svc.cluster.local
;; global options: +cmd
;; Got answer:
;; WARNING: .local is reserved for Multicast DNS
;; You are currently testing what happens when an mDNS query is leaked to DNS
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 3978
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: cc91be17762fb1b8 (echoed)
;; QUESTION SECTION:
;consul-server.default.svc.cluster.local. IN A

;; ANSWER SECTION:
consul-server.default.svc.cluster.local. 5 IN A 172.17.0.11
consul-server.default.svc.cluster.local. 5 IN A 172.17.0.12
consul-server.default.svc.cluster.local. 5 IN A 172.17.0.13

;; Query time: 0 msec
;; SERVER: 10.96.0.10#53(10.96.0.10)
;; WHEN: Fri Oct 05 17:13:27 UTC 2018
;; MSG SIZE  rcvd: 245

More right approach to perform the dig against consul DNS would be the following instead of using the consul container IP address we are using the service URL.

/ # dig @consul-server.default.svc.cluster.local -p 8600 consul.service.consul

; <<>> DiG 9.11.2-P1 <<>> @consul-server.default.svc.cluster.local -p 8600 consul.service.consul
; (3 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 19190
;; flags: qr aa rd; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 4
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;consul.service.consul.     IN  A

;; ANSWER SECTION:
consul.service.consul.  0   IN  A   172.17.0.13
consul.service.consul.  0   IN  A   172.17.0.12
consul.service.consul.  0   IN  A   172.17.0.11

;; ADDITIONAL SECTION:
consul.service.consul.  0   IN  TXT "consul-network-segment="
consul.service.consul.  0   IN  TXT "consul-network-segment="
consul.service.consul.  0   IN  TXT "consul-network-segment="

;; Query time: 7 msec
;; SERVER: 172.17.0.11#8600(172.17.0.11)
;; WHEN: Fri Oct 05 17:16:50 UTC 2018
;; MSG SIZE  rcvd: 206
adilyse commented 5 years ago

I've added documentation for setting up the Consul DNS in the case where the cluster is using CoreDNS (see Consul PR). Hopefully that will help address this situation.

I'm going to close this, but please feel free to open another issue if there's something else that needs to be addressed.

GregHanson commented 5 years ago

Thanks for updating the docs @alisdair, but I cannot get the modified configmap provided in the docs working.

In the coredns configmap with the following:

    consul {
      errors
      cache 30
      proxy . <consul-dns service cluster ip>
    }

Dig produces:

; <<>> DiG 9.11.2-P1 <<>> httpbin.service.consul
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 8872
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1280
;; QUESTION SECTION:
;httpbin.service.consul.        IN  A

;; AUTHORITY SECTION:
.           30  IN  SOA a.root-servers.net. nstld.verisign-grs.com. 2019032100 1800 900 604800 86400

;; Query time: 2 msec
;; SERVER: 172.21.0.10#53(172.21.0.10)
;; WHEN: Thu Mar 21 13:46:10 UTC 2019
;; MSG SIZE  rcvd: 126

Whereas with the following coredns field:

  service.consul:53 {
      errors
      cache 30
      proxy . <consul_dns_ip> 
  }

I get some different output from the dig command:

; <<>> DiG 9.11.2-P1 <<>> httpbin.service.consul
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 17747
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 2
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;httpbin.service.consul.        IN  A

;; ANSWER SECTION:
httpbin.service.consul. 5   IN  CNAME   httpbin.org.

;; ADDITIONAL SECTION:
httpbin.service.consul. 5   IN  TXT "consul-network-segment="

;; Query time: 45 msec
;; SERVER: 172.21.0.10#53(172.21.0.10)
;; WHEN: Thu Mar 21 13:48:56 UTC 2019
;; MSG SIZE  rcvd: 156

I registered httpbin with the following command, so maybe there is a problem with with how I am registering the service?

curl --request PUT \
    --data '{"Name":"httpbin","Address":"httpbin.org","Port":80}' \
    $INGRESS_HOST:8500/v1/agent/service/register