kubernetes / dns

Kubernetes DNS service
Apache License 2.0
922 stars 462 forks source link

Latency/timeout from Kube DNS #640

Open lbernick opened 2 months ago

lbernick commented 2 months ago

Likely a duplicate of https://github.com/kubernetes/dns/issues/96, but I'm opening a new issue as requested here since that one is old/possibly no longer relevant.

On 6/6/24 I observed DNS resolution for a storage service in our cluster taking multiple seconds to complete and am now observing it again; here are my notes from that issue.

First, I made a request from within our app container to the public hostname of the service:

root@app-55f755fc8c-88gvj:/app# time curl --location 'https://<hostname>/api/v1/ping'
pong

real    0m5.196s
user    0m0.020s
sys 0m0.006s

I then tried the cluster internal hostname:

/app # time curl --location '<service>.<namespace>.svc.cluster.local:443/api/v1/ping'
pong
real    0m 7.52s
user    0m 0.00s
sys 0m 0.00s

Making a request to the IP address was very fast:

/app # time curl --location '<IP>:443/api/v1/ping'
pong
real    0m 0.00s
user    0m 0.00s
sys 0m 0.00s

I believe this was an issue with kube DNS timing out for the following reasons:

Unfortunately, I'm not sure how to reproduce this issue. This similar issue suggests this error may occur when a deployment disconnects from the kube API server: https://github.com/cert-manager/cert-manager/issues/4685#issuecomment-1762594269

I didn't observe any restarts for the kube-dns pods:

➜  ~ k get po -n kube-system
NAME                                                       READY   STATUS    RESTARTS       AGE
...
kube-dns-f65b59b6b-bkqw9                                   4/4     Running   0              31h
kube-dns-f65b59b6b-v72bv                                   4/4     Running   0              2d10h

k8s version:

➜  ~ k version
Client Version: v1.28.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.10-gke.1075001