kumahq / kuma

🐻 The multi-zone service mesh for containers, Kubernetes and VMs. Built with Envoy. CNCF Sandbox Project.
https://kuma.io/install
Apache License 2.0
3.67k stars 333 forks source link

take dns server into account in /ready endpoint of kuma-dp #2571

Open bartsmykla opened 3 years ago

bartsmykla commented 3 years ago

Summary

When working on confirmation that kuma works with websockets I faced an issue when my simple websocket client tried to connect to the server using our dns names (server.default.svc.80.mesh) and it was failing till I added some retries. I then realized that kuma-dp's dns server (coredns) needs some time to start, so it should be taken into account.

For kubernetes it looks like we are injecting kuma-sidecar and pointing to envoy's /ready endpoint, so I assume we should introduce our own /ready endpoint as a part of kuma-dp which would ask envoy and coredns for readiness.

For coredns we could use this plugin: https://coredns.io/plugins/ready/

Steps To Reproduce

  1. Deploy kuma

    kumactl install control-plane | kubectl apply -f -
  2. Wait for kuma to be deployed

  3. Annotate default namespace to automatically inject kuma-dps

    kubectl annotate namespace default kuma.io/sidecar-injection=enabled
  4. Deploy my simple websocket server

    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
     name: server
    spec:
     replicas: 1
     selector:
       matchLabels:
         app: server
     template:
       metadata:
         labels:
           app: server
         name: server
       spec:
         containers:
           - name: server
             image: bartsmykla/websockets-test-server:v2-0.1.1
             ports:
               - containerPort: 8080
    ---
    apiVersion: v1
    kind: Service
    metadata:
     name: server
    spec:
     ports:
       - port: 80
         protocol: TCP
         targetPort: 8080
     selector:
       app: server
  5. Deploy my simple websocket client

    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
     name: client
    spec:
     replicas: 1
     selector:
       matchLabels:
         app: client
     template:
       metadata:
         labels:
           app: client
         name: client
       spec:
         containers:
           - name: client
             image: bartsmykla/websockets-test-client:v2-0.1.3
             ports:
               - containerPort: 8081
             command:
             - /client
             args:
             - --ws-server-addr
             - ws://server.default.svc.80.mesh/ws
             - --addr
             - localhost:8081
    ---
    apiVersion: v1
    kind: Service
    metadata:
     name: client
    spec:
     ports:
       - port: 80
         protocol: TCP
         targetPort: 8081
     selector:
       app: client
  6. That's it, you can look into logs of the clent now

Additional Details & Logs

any version with kuma-dp dns server

bartsmykla commented 3 years ago

Ok, so I started dividing this task into smaller ones:

What we have to do

  1. Expose /ready endpoint when kuma-dp starts
    • 1.1. Call envoy's ready endpoint (localhost:{{ adminPort }}/ready)
    • 1.2. If kuma-dp DNS server is enabled (it is by default) call CoreDNS' ready endpoint
      • 1.2.1. We have to adjust CoreDNS' configuration to include ready plugin, which will expose this endpoint
  2. Modify k8s injector and change readiness probe to point to our new /ready endpoint

Steps:

Modify our current CoreDNS configuration to include ready plugin

  1. Add ready plugin to this file, which is being used when compiling our CoreDNS binary. This file could look like:
prometheus:metrics
ready:ready
errors:errors
log:log
template:template
alternate:github.com/coredns/alternate
forward:forward
  1. Modify Corefile template to include configuration for ready plugin:
const DefaultCoreFileTemplate = `.:{{ .CoreDNSPort }} {
    forward . 127.0.0.1:{{ .EnvoyDNSPort }}
    alternate NXDOMAIN,SERVFAIL,REFUSED . /etc/resolv.conf
    prometheus localhost:{{ .PrometheusPort }}
    errors
    ready localhost:15399
}

.:{{ .CoreDNSEmptyPort }} {
    template ANY ANY . {
      rcode NXDOMAIN
    }
}`

Expose /ready endpoint when kuma-dp starts

  1. When kuma-dp starts, the additional HTTP server should start with dummy /ready endpoint, always responding with status 200
    • The hostname, port and path should be hard-coded: localhost:5699/ready

      I came up with the port number when combining 56 which most of our ports are using as two initial numbers with 99 which are two initial numbers of default envoy admin port (9901) which currently exposes /ready endpoint we are using in k8s as a readiness probe for whole kuma-sidecar container, and we of course can pick different port instead

  2. Call envoy's /ready endpoint end return it's response as ours
    • endpoint: http://localhost:{{ adminPort }}/ready
  3. Call CoreDNS' ready endpoint if kuma-dp dns server mode is enabled (it is by default)
    • endpoint: http://localhost:15399/ready

      we hard-coded port and host in Corefile above, and path is hard-coded in the plugin itself, and cannot be changed

  4. Make both calls from the last two steps async and if any will return with status code not equal 200, our endpoint should return status code 503

Make hosts and ports configurable

  1. Add kuma-dp run flags:
    • --readiness-disabled - when set the /ready endpoint won't be exposed (default: false)
    • --readiness-host - the host on which /ready endpoint will be exposed (default: localhost)
    • --readiness-port - the port on which /ready endpoint will be listening on (default: 5699)

Stretch goals

  1. As we are working on /ready endpoint, we could also add /health one

tbc.

github-actions[bot] commented 2 years ago

This issue was inactive for 30 days it will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant please comment on it promptly or attend the next triage meeting.

github-actions[bot] commented 2 years ago

This issue was inactive for 30 days it will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant please comment on it promptly or attend the next triage meeting.

github-actions[bot] commented 2 years ago

This issue was inactive for 30 days it will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant please comment on it promptly or attend the next triage meeting.

github-actions[bot] commented 1 year ago

This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant, please comment on it or attend the next triage meeting.

github-actions[bot] commented 1 year ago

This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant, please comment on it or attend the next triage meeting.

github-actions[bot] commented 1 year ago

This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant, please comment on it or attend the next triage meeting.

github-actions[bot] commented 1 year ago

This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant, please comment on it or attend the next triage meeting.

github-actions[bot] commented 11 months ago

This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant, please comment on it or attend the next triage meeting.

github-actions[bot] commented 8 months ago

This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant, please comment on it or attend the next triage meeting.

github-actions[bot] commented 3 months ago

This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant, please comment on it or attend the next triage meeting.

jijiechen commented 3 months ago

By adding this kuma.io/wait-for-dataplane-ready annotation onto the pod, we'll make the app is only started after the kuma-sidecar is ready. So the issue will be mitigated.

More on doc https://kuma.io/docs/2.8.x/reference/kubernetes-annotations/#kumaiowait-for-dataplane-ready