kubernetes / dns

Kubernetes DNS service
Apache License 2.0
911 stars 459 forks source link

NodeLocal DNS container hung on SIGTERM #453

Open rtheis opened 3 years ago

rtheis commented 3 years ago

We are still hitting the same problem reported by https://github.com/kubernetes/dns/issues/394. The test failure occurred on Kubernetes version 1.21 with NodeLocal DNS cache version 1.17.3.

To recap, NodeLocal DNS container occasionally hangs on termination causing Kubernetes to kill the container using SIGTERM after the grace period has expired. This leaves left over iptables rules on the node thus breaking DNS resolution. Our theory is that there is iptables lock contention between NodeLocal DNS, Calico and/or Kubernetes.

prameshj commented 3 years ago

Doesn't nodelocaldns startup right away though? The DNS downtime should be O(seconds). Is that what you observe?

It is possible for nodelocaldns to run into lock contention, but that usually has a log message. Anything in the logs?

rtheis commented 3 years ago

@prameshj I was unable to collect any valuable logs at the time of the latest failure. I assume there is some type of lock contention that causes the pod to hang on termination.

NodeLocal DNS does startup right away, but our test failure comes when we verify that DNS works after disabling NodeLocal DNS. If we restart NodeLocal DNS then stop it again, that usually fixes the node.

prameshj commented 3 years ago

NodeLocal DNS does startup right away, but our test failure comes when we verify that DNS works after disabling NodeLocal DNS.

Ah I see. Just to confirm - 1) test disables nodelocaldns 2) nodelocaldns pod is stuck handling sigterm and is killed by kubelet with iptables rules getting left over. 3) test times out with DNS failure?

Do you see a log line of the sigterm being handled - https://github.com/kubernetes/dns/blob/3b17e06879a46b2ec5d97105c611a315897fdb48/vendor/github.com/coredns/caddy/sigtrap_posix.go#L42 ?

It should call teardown in that case - https://github.com/kubernetes/dns/blob/3b17e06879a46b2ec5d97105c611a315897fdb48/cmd/node-cache/main.go#L53

It is possible that the pod handles sigterm and tries cleaning up iptables, but cannot get the lock. We do expose a metric on port 9353 for nodelocaldns lock errors, but we do not increment it for delete errors. We should check errors in https://github.com/kubernetes/dns/blob/3b17e06879a46b2ec5d97105c611a315897fdb48/cmd/node-cache/app/cache_app.go#L157 and update the metric.

rtheis commented 3 years ago

@prameshj That is correct. We've updated the termination grace period to 900 seconds and still see this problem. Although, the failure rate has been much lower than it has been in the past. Given the long termination, it seems like there is a hang somewhere. We'll continue trying to collect more data when this problem occurs.

rtheis commented 2 years ago

We hit the problem again on Kubernetes version 1.20 with NodeLocal DNS version 1.17.3. Unfortunately, I don't have any additional debug data to provide.

rtheis commented 2 years ago

We hit this problem again on Kubernetes version 1.22 with NodeLocal DNS version 1.21.1. We are collecting debug data now to determine if we can find the root cause.

prameshj commented 2 years ago

Thanks. I have also opened https://github.com/kubernetes/dns/pull/488 to count errors from rule deletions at teardown, in case that provides some hints.

rtheis commented 2 years ago

We hit the problem again Kubernetes version 1.22 with NodeLocal DNS version 1.21.1. Here is the end of the log captured during pod termination:

[INFO] SIGTERM: Shutting down servers then terminating
prameshj commented 2 years ago

We hit the problem again Kubernetes version 1.22 with NodeLocal DNS version 1.21.1. Here is the end of the log captured during pod termination:

[INFO] SIGTERM: Shutting down servers then terminating

Any metrics from node-cache?

rtheis commented 2 years ago

@prameshj unfortunately, I don't have any metrics captured when the failure occurred. What would you like us to collect?

prameshj commented 2 years ago

488

The "setup_errors_total" metric which was modified in https://github.com/kubernetes/dns/pull/488 to also increment during deletions. That PR is included in 1.21.2 and later tags.

rtheis commented 2 years ago

Thanks, we'll update our test to collect metrics once we pull in the NodeLocal DNS cache latest version.

rtheis commented 2 years ago

We were able to recreate the problem on NodeLocal DNS version 1.21.3. Here are the logs and metrics.

[INFO] SIGTERM: Shutting down servers then terminating
# HELP coredns_build_info A metric with a constant '1' value labeled by version, revision, and goversion from which CoreDNS was built.
# TYPE coredns_build_info gauge
coredns_build_info{goversion="go1.16.10",revision="",version="1.7.0"} 1
# HELP coredns_cache_entries The number of elements in the cache.
# TYPE coredns_cache_entries gauge
coredns_cache_entries{server="dns://169.254.20.10:53",type="denial"} 2
coredns_cache_entries{server="dns://169.254.20.10:53",type="success"} 0
coredns_cache_entries{server="dns://172.21.0.10:53",type="denial"} 8
coredns_cache_entries{server="dns://172.21.0.10:53",type="success"} 1
# HELP coredns_cache_hits_total The count of cache hits.
# TYPE coredns_cache_hits_total counter
coredns_cache_hits_total{server="dns://172.21.0.10:53",type="denial"} 882
coredns_cache_hits_total{server="dns://172.21.0.10:53",type="success"} 131
# HELP coredns_cache_misses_total The count of cache misses.
# TYPE coredns_cache_misses_total counter
coredns_cache_misses_total{server="dns://169.254.20.10:53"} 8
coredns_cache_misses_total{server="dns://172.21.0.10:53"} 51
# HELP coredns_dns_request_duration_seconds Histogram of the time (in seconds) each request took.
# TYPE coredns_dns_request_duration_seconds histogram
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone=".",le="0.00025"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone=".",le="0.0005"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone=".",le="0.001"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone=".",le="0.002"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone=".",le="0.004"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone=".",le="0.008"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone=".",le="0.016"} 1
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone=".",le="0.032"} 1
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone=".",le="0.064"} 1
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone=".",le="0.128"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone=".",le="0.256"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone=".",le="0.512"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone=".",le="1.024"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone=".",le="2.048"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone=".",le="4.096"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone=".",le="8.192"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone=".",le="+Inf"} 2
coredns_dns_request_duration_seconds_sum{server="dns://169.254.20.10:53",type="other",zone="."} 0.12080307600000001
coredns_dns_request_duration_seconds_count{server="dns://169.254.20.10:53",type="other",zone="."} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="cluster.local.",le="0.00025"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="cluster.local.",le="0.0005"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="cluster.local.",le="0.001"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="cluster.local.",le="0.002"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="cluster.local.",le="0.004"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="cluster.local.",le="0.008"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="cluster.local.",le="0.016"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="cluster.local.",le="0.032"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="cluster.local.",le="0.064"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="cluster.local.",le="0.128"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="cluster.local.",le="0.256"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="cluster.local.",le="0.512"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="cluster.local.",le="1.024"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="cluster.local.",le="2.048"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="cluster.local.",le="4.096"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="cluster.local.",le="8.192"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="cluster.local.",le="+Inf"} 2
coredns_dns_request_duration_seconds_sum{server="dns://169.254.20.10:53",type="other",zone="cluster.local."} 0.002413771
coredns_dns_request_duration_seconds_count{server="dns://169.254.20.10:53",type="other",zone="cluster.local."} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa.",le="0.00025"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa.",le="0.0005"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa.",le="0.001"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa.",le="0.002"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa.",le="0.004"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa.",le="0.008"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa.",le="0.016"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa.",le="0.032"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa.",le="0.064"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa.",le="0.128"} 1
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa.",le="0.256"} 1
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa.",le="0.512"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa.",le="1.024"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa.",le="2.048"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa.",le="4.096"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa.",le="8.192"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa.",le="+Inf"} 2
coredns_dns_request_duration_seconds_sum{server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa."} 0.391487433
coredns_dns_request_duration_seconds_count{server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa."} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="ip6.arpa.",le="0.00025"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="ip6.arpa.",le="0.0005"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="ip6.arpa.",le="0.001"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="ip6.arpa.",le="0.002"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="ip6.arpa.",le="0.004"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="ip6.arpa.",le="0.008"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="ip6.arpa.",le="0.016"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="ip6.arpa.",le="0.032"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="ip6.arpa.",le="0.064"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="ip6.arpa.",le="0.128"} 1
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="ip6.arpa.",le="0.256"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="ip6.arpa.",le="0.512"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="ip6.arpa.",le="1.024"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="ip6.arpa.",le="2.048"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="ip6.arpa.",le="4.096"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="ip6.arpa.",le="8.192"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="ip6.arpa.",le="+Inf"} 2
coredns_dns_request_duration_seconds_sum{server="dns://169.254.20.10:53",type="other",zone="ip6.arpa."} 0.256938208
coredns_dns_request_duration_seconds_count{server="dns://169.254.20.10:53",type="other",zone="ip6.arpa."} 2
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="A",zone="cluster.local.",le="0.00025"} 508
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="A",zone="cluster.local.",le="0.0005"} 518
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="A",zone="cluster.local.",le="0.001"} 530
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="A",zone="cluster.local.",le="0.002"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="A",zone="cluster.local.",le="0.004"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="A",zone="cluster.local.",le="0.008"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="A",zone="cluster.local.",le="0.016"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="A",zone="cluster.local.",le="0.032"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="A",zone="cluster.local.",le="0.064"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="A",zone="cluster.local.",le="0.128"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="A",zone="cluster.local.",le="0.256"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="A",zone="cluster.local.",le="0.512"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="A",zone="cluster.local.",le="1.024"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="A",zone="cluster.local.",le="2.048"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="A",zone="cluster.local.",le="4.096"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="A",zone="cluster.local.",le="8.192"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="A",zone="cluster.local.",le="+Inf"} 532
coredns_dns_request_duration_seconds_sum{server="dns://172.21.0.10:53",type="A",zone="cluster.local."} 0.047663295000000064
coredns_dns_request_duration_seconds_count{server="dns://172.21.0.10:53",type="A",zone="cluster.local."} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local.",le="0.00025"} 500
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local.",le="0.0005"} 519
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local.",le="0.001"} 531
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local.",le="0.002"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local.",le="0.004"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local.",le="0.008"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local.",le="0.016"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local.",le="0.032"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local.",le="0.064"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local.",le="0.128"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local.",le="0.256"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local.",le="0.512"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local.",le="1.024"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local.",le="2.048"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local.",le="4.096"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local.",le="8.192"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local.",le="+Inf"} 532
coredns_dns_request_duration_seconds_sum{server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local."} 0.046383935999999994
coredns_dns_request_duration_seconds_count{server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local."} 532
# HELP coredns_dns_request_size_bytes Size of the EDNS0 UDP buffer in bytes (64K for TCP).
# TYPE coredns_dns_request_size_bytes histogram
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="0"} 0
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="100"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="200"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="300"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="400"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="511"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="1023"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="2047"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="4095"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="8291"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="16000"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="32000"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="48000"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="64000"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="+Inf"} 2
coredns_dns_request_size_bytes_sum{proto="udp",server="dns://169.254.20.10:53",zone="."} 114
coredns_dns_request_size_bytes_count{proto="udp",server="dns://169.254.20.10:53",zone="."} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="0"} 0
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="100"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="200"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="300"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="400"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="511"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="1023"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="2047"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="4095"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="8291"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="16000"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="32000"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="48000"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="64000"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="+Inf"} 2
coredns_dns_request_size_bytes_sum{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local."} 142
coredns_dns_request_size_bytes_count{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local."} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="0"} 0
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="100"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="200"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="300"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="400"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="511"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="1023"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="2047"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="4095"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="8291"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="16000"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="32000"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="48000"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="64000"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="+Inf"} 2
coredns_dns_request_size_bytes_sum{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa."} 140
coredns_dns_request_size_bytes_count{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa."} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="0"} 0
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="100"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="200"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="300"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="400"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="511"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="1023"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="2047"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="4095"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="8291"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="16000"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="32000"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="48000"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="64000"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="+Inf"} 2
coredns_dns_request_size_bytes_sum{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa."} 132
coredns_dns_request_size_bytes_count{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa."} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="0"} 0
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="100"} 1064
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="200"} 1064
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="300"} 1064
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="400"} 1064
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="511"} 1064
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="1023"} 1064
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="2047"} 1064
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="4095"} 1064
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="8291"} 1064
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="16000"} 1064
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="32000"} 1064
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="48000"} 1064
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="64000"} 1064
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="+Inf"} 1064
coredns_dns_request_size_bytes_sum{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local."} 73948
coredns_dns_request_size_bytes_count{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local."} 1064
# HELP coredns_dns_requests_total Counter of DNS requests made per zone, protocol and family.
# TYPE coredns_dns_requests_total counter
coredns_dns_requests_total{family="1",proto="udp",server="dns://169.254.20.10:53",type="other",zone="."} 2
coredns_dns_requests_total{family="1",proto="udp",server="dns://169.254.20.10:53",type="other",zone="cluster.local."} 2
coredns_dns_requests_total{family="1",proto="udp",server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa."} 2
coredns_dns_requests_total{family="1",proto="udp",server="dns://169.254.20.10:53",type="other",zone="ip6.arpa."} 2
coredns_dns_requests_total{family="1",proto="udp",server="dns://172.21.0.10:53",type="A",zone="cluster.local."} 532
coredns_dns_requests_total{family="1",proto="udp",server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local."} 532
# HELP coredns_dns_response_size_bytes Size of the returned response in bytes.
# TYPE coredns_dns_response_size_bytes histogram
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="0"} 0
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="100"} 0
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="200"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="300"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="400"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="511"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="1023"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="2047"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="4095"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="8291"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="16000"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="32000"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="48000"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="64000"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="+Inf"} 2
coredns_dns_response_size_bytes_sum{proto="udp",server="dns://169.254.20.10:53",zone="."} 264
coredns_dns_response_size_bytes_count{proto="udp",server="dns://169.254.20.10:53",zone="."} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="0"} 0
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="100"} 0
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="200"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="300"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="400"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="511"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="1023"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="2047"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="4095"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="8291"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="16000"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="32000"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="48000"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="64000"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="+Inf"} 2
coredns_dns_response_size_bytes_sum{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local."} 328
coredns_dns_response_size_bytes_count{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local."} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="0"} 0
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="100"} 0
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="200"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="300"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="400"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="511"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="1023"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="2047"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="4095"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="8291"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="16000"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="32000"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="48000"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="64000"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="+Inf"} 2
coredns_dns_response_size_bytes_sum{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa."} 308
coredns_dns_response_size_bytes_count{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa."} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="0"} 0
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="100"} 0
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="200"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="300"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="400"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="511"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="1023"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="2047"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="4095"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="8291"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="16000"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="32000"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="48000"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="64000"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="+Inf"} 2
coredns_dns_response_size_bytes_sum{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa."} 284
coredns_dns_response_size_bytes_count{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa."} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="0"} 0
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="100"} 0
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="200"} 1064
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="300"} 1064
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="400"} 1064
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="511"} 1064
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="1023"} 1064
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="2047"} 1064
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="4095"} 1064
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="8291"} 1064
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="16000"} 1064
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="32000"} 1064
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="48000"} 1064
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="64000"} 1064
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="+Inf"} 1064
coredns_dns_response_size_bytes_sum{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local."} 167447
coredns_dns_response_size_bytes_count{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local."} 1064
# HELP coredns_dns_responses_total Counter of response status codes.
# TYPE coredns_dns_responses_total counter
coredns_dns_responses_total{rcode="NOERROR",server="dns://172.21.0.10:53",zone="cluster.local."} 266
coredns_dns_responses_total{rcode="NXDOMAIN",server="dns://169.254.20.10:53",zone="."} 2
coredns_dns_responses_total{rcode="NXDOMAIN",server="dns://169.254.20.10:53",zone="cluster.local."} 2
coredns_dns_responses_total{rcode="NXDOMAIN",server="dns://169.254.20.10:53",zone="in-addr.arpa."} 2
coredns_dns_responses_total{rcode="NXDOMAIN",server="dns://169.254.20.10:53",zone="ip6.arpa."} 2
coredns_dns_responses_total{rcode="NXDOMAIN",server="dns://172.21.0.10:53",zone="cluster.local."} 798
# HELP coredns_forward_max_concurrent_rejects_total Counter of the number of queries rejected because the concurrent queries were at maximum.
# TYPE coredns_forward_max_concurrent_rejects_total counter
coredns_forward_max_concurrent_rejects_total 0
# HELP coredns_forward_request_duration_seconds Histogram of the time each request took.
# TYPE coredns_forward_request_duration_seconds histogram
coredns_forward_request_duration_seconds_bucket{to="10.0.80.11:53",le="0.00025"} 0
coredns_forward_request_duration_seconds_bucket{to="10.0.80.11:53",le="0.0005"} 0
coredns_forward_request_duration_seconds_bucket{to="10.0.80.11:53",le="0.001"} 0
coredns_forward_request_duration_seconds_bucket{to="10.0.80.11:53",le="0.002"} 0
coredns_forward_request_duration_seconds_bucket{to="10.0.80.11:53",le="0.004"} 0
coredns_forward_request_duration_seconds_bucket{to="10.0.80.11:53",le="0.008"} 0
coredns_forward_request_duration_seconds_bucket{to="10.0.80.11:53",le="0.016"} 0
coredns_forward_request_duration_seconds_bucket{to="10.0.80.11:53",le="0.032"} 0
coredns_forward_request_duration_seconds_bucket{to="10.0.80.11:53",le="0.064"} 0
coredns_forward_request_duration_seconds_bucket{to="10.0.80.11:53",le="0.128"} 1
coredns_forward_request_duration_seconds_bucket{to="10.0.80.11:53",le="0.256"} 1
coredns_forward_request_duration_seconds_bucket{to="10.0.80.11:53",le="0.512"} 1
coredns_forward_request_duration_seconds_bucket{to="10.0.80.11:53",le="1.024"} 1
coredns_forward_request_duration_seconds_bucket{to="10.0.80.11:53",le="2.048"} 1
coredns_forward_request_duration_seconds_bucket{to="10.0.80.11:53",le="4.096"} 1
coredns_forward_request_duration_seconds_bucket{to="10.0.80.11:53",le="8.192"} 1
coredns_forward_request_duration_seconds_bucket{to="10.0.80.11:53",le="+Inf"} 1
coredns_forward_request_duration_seconds_sum{to="10.0.80.11:53"} 0.112503827
coredns_forward_request_duration_seconds_count{to="10.0.80.11:53"} 1
coredns_forward_request_duration_seconds_bucket{to="172.21.156.115:53",le="0.00025"} 1
coredns_forward_request_duration_seconds_bucket{to="172.21.156.115:53",le="0.0005"} 34
coredns_forward_request_duration_seconds_bucket{to="172.21.156.115:53",le="0.001"} 49
coredns_forward_request_duration_seconds_bucket{to="172.21.156.115:53",le="0.002"} 53
coredns_forward_request_duration_seconds_bucket{to="172.21.156.115:53",le="0.004"} 53
coredns_forward_request_duration_seconds_bucket{to="172.21.156.115:53",le="0.008"} 53
coredns_forward_request_duration_seconds_bucket{to="172.21.156.115:53",le="0.016"} 53
coredns_forward_request_duration_seconds_bucket{to="172.21.156.115:53",le="0.032"} 53
coredns_forward_request_duration_seconds_bucket{to="172.21.156.115:53",le="0.064"} 53
coredns_forward_request_duration_seconds_bucket{to="172.21.156.115:53",le="0.128"} 55
coredns_forward_request_duration_seconds_bucket{to="172.21.156.115:53",le="0.256"} 56
coredns_forward_request_duration_seconds_bucket{to="172.21.156.115:53",le="0.512"} 57
coredns_forward_request_duration_seconds_bucket{to="172.21.156.115:53",le="1.024"} 57
coredns_forward_request_duration_seconds_bucket{to="172.21.156.115:53",le="2.048"} 57
coredns_forward_request_duration_seconds_bucket{to="172.21.156.115:53",le="4.096"} 57
coredns_forward_request_duration_seconds_bucket{to="172.21.156.115:53",le="8.192"} 57
coredns_forward_request_duration_seconds_bucket{to="172.21.156.115:53",le="+Inf"} 57
coredns_forward_request_duration_seconds_sum{to="172.21.156.115:53"} 0.6737926949999999
coredns_forward_request_duration_seconds_count{to="172.21.156.115:53"} 57
coredns_forward_request_duration_seconds_bucket{to="8.8.8.8:53",le="0.00025"} 0
coredns_forward_request_duration_seconds_bucket{to="8.8.8.8:53",le="0.0005"} 0
coredns_forward_request_duration_seconds_bucket{to="8.8.8.8:53",le="0.001"} 0
coredns_forward_request_duration_seconds_bucket{to="8.8.8.8:53",le="0.002"} 0
coredns_forward_request_duration_seconds_bucket{to="8.8.8.8:53",le="0.004"} 0
coredns_forward_request_duration_seconds_bucket{to="8.8.8.8:53",le="0.008"} 0
coredns_forward_request_duration_seconds_bucket{to="8.8.8.8:53",le="0.016"} 1
coredns_forward_request_duration_seconds_bucket{to="8.8.8.8:53",le="0.032"} 1
coredns_forward_request_duration_seconds_bucket{to="8.8.8.8:53",le="0.064"} 1
coredns_forward_request_duration_seconds_bucket{to="8.8.8.8:53",le="0.128"} 1
coredns_forward_request_duration_seconds_bucket{to="8.8.8.8:53",le="0.256"} 1
coredns_forward_request_duration_seconds_bucket{to="8.8.8.8:53",le="0.512"} 1
coredns_forward_request_duration_seconds_bucket{to="8.8.8.8:53",le="1.024"} 1
coredns_forward_request_duration_seconds_bucket{to="8.8.8.8:53",le="2.048"} 1
coredns_forward_request_duration_seconds_bucket{to="8.8.8.8:53",le="4.096"} 1
coredns_forward_request_duration_seconds_bucket{to="8.8.8.8:53",le="8.192"} 1
coredns_forward_request_duration_seconds_bucket{to="8.8.8.8:53",le="+Inf"} 1
coredns_forward_request_duration_seconds_sum{to="8.8.8.8:53"} 0.008086901
coredns_forward_request_duration_seconds_count{to="8.8.8.8:53"} 1
# HELP coredns_forward_requests_total Counter of requests made per upstream.
# TYPE coredns_forward_requests_total counter
coredns_forward_requests_total{to="10.0.80.11:53"} 1
coredns_forward_requests_total{to="172.21.156.115:53"} 57
coredns_forward_requests_total{to="8.8.8.8:53"} 1
# HELP coredns_forward_responses_total Counter of requests made per upstream.
# TYPE coredns_forward_responses_total counter
coredns_forward_responses_total{rcode="NOERROR",to="172.21.156.115:53"} 9
coredns_forward_responses_total{rcode="NXDOMAIN",to="10.0.80.11:53"} 1
coredns_forward_responses_total{rcode="NXDOMAIN",to="172.21.156.115:53"} 48
coredns_forward_responses_total{rcode="NXDOMAIN",to="8.8.8.8:53"} 1
# HELP coredns_health_request_duration_seconds Histogram of the time (in seconds) each request took.
# TYPE coredns_health_request_duration_seconds histogram
coredns_health_request_duration_seconds_bucket{le="0.00025"} 0
coredns_health_request_duration_seconds_bucket{le="0.0005"} 20
coredns_health_request_duration_seconds_bucket{le="0.001"} 1074
coredns_health_request_duration_seconds_bucket{le="0.002"} 1107
coredns_health_request_duration_seconds_bucket{le="0.004"} 1117
coredns_health_request_duration_seconds_bucket{le="0.008"} 1119
coredns_health_request_duration_seconds_bucket{le="0.016"} 1119
coredns_health_request_duration_seconds_bucket{le="0.032"} 1119
coredns_health_request_duration_seconds_bucket{le="0.064"} 1119
coredns_health_request_duration_seconds_bucket{le="0.128"} 1119
coredns_health_request_duration_seconds_bucket{le="0.256"} 1119
coredns_health_request_duration_seconds_bucket{le="0.512"} 1119
coredns_health_request_duration_seconds_bucket{le="1.024"} 1119
coredns_health_request_duration_seconds_bucket{le="2.048"} 1119
coredns_health_request_duration_seconds_bucket{le="4.096"} 1119
coredns_health_request_duration_seconds_bucket{le="8.192"} 1119
coredns_health_request_duration_seconds_bucket{le="+Inf"} 1119
coredns_health_request_duration_seconds_sum 0.793689844
coredns_health_request_duration_seconds_count 1119
# HELP coredns_panics_total A metrics that counts the number of panics.
# TYPE coredns_panics_total counter
coredns_panics_total 0
# HELP coredns_plugin_enabled A metric that indicates whether a plugin is enabled on per server and zone basis.
# TYPE coredns_plugin_enabled gauge
coredns_plugin_enabled{name="cache",server="dns://169.254.20.10:53",zone="."} 1
coredns_plugin_enabled{name="cache",server="dns://169.254.20.10:53",zone="cluster.local."} 1
coredns_plugin_enabled{name="cache",server="dns://169.254.20.10:53",zone="in-addr.arpa."} 1
coredns_plugin_enabled{name="cache",server="dns://169.254.20.10:53",zone="ip6.arpa."} 1
coredns_plugin_enabled{name="cache",server="dns://172.21.0.10:53",zone="."} 1
coredns_plugin_enabled{name="cache",server="dns://172.21.0.10:53",zone="cluster.local."} 1
coredns_plugin_enabled{name="cache",server="dns://172.21.0.10:53",zone="in-addr.arpa."} 1
coredns_plugin_enabled{name="cache",server="dns://172.21.0.10:53",zone="ip6.arpa."} 1
coredns_plugin_enabled{name="errors",server="dns://169.254.20.10:53",zone="."} 1
coredns_plugin_enabled{name="errors",server="dns://169.254.20.10:53",zone="cluster.local."} 1
coredns_plugin_enabled{name="errors",server="dns://169.254.20.10:53",zone="in-addr.arpa."} 1
coredns_plugin_enabled{name="errors",server="dns://169.254.20.10:53",zone="ip6.arpa."} 1
coredns_plugin_enabled{name="errors",server="dns://172.21.0.10:53",zone="."} 1
coredns_plugin_enabled{name="errors",server="dns://172.21.0.10:53",zone="cluster.local."} 1
coredns_plugin_enabled{name="errors",server="dns://172.21.0.10:53",zone="in-addr.arpa."} 1
coredns_plugin_enabled{name="errors",server="dns://172.21.0.10:53",zone="ip6.arpa."} 1
coredns_plugin_enabled{name="forward",server="dns://169.254.20.10:53",zone="."} 1
coredns_plugin_enabled{name="forward",server="dns://169.254.20.10:53",zone="cluster.local."} 1
coredns_plugin_enabled{name="forward",server="dns://169.254.20.10:53",zone="in-addr.arpa."} 1
coredns_plugin_enabled{name="forward",server="dns://169.254.20.10:53",zone="ip6.arpa."} 1
coredns_plugin_enabled{name="forward",server="dns://172.21.0.10:53",zone="."} 1
coredns_plugin_enabled{name="forward",server="dns://172.21.0.10:53",zone="cluster.local."} 1
coredns_plugin_enabled{name="forward",server="dns://172.21.0.10:53",zone="in-addr.arpa."} 1
coredns_plugin_enabled{name="forward",server="dns://172.21.0.10:53",zone="ip6.arpa."} 1
coredns_plugin_enabled{name="log",server="dns://169.254.20.10:53",zone="."} 1
coredns_plugin_enabled{name="log",server="dns://169.254.20.10:53",zone="cluster.local."} 1
coredns_plugin_enabled{name="log",server="dns://169.254.20.10:53",zone="in-addr.arpa."} 1
coredns_plugin_enabled{name="log",server="dns://169.254.20.10:53",zone="ip6.arpa."} 1
coredns_plugin_enabled{name="log",server="dns://172.21.0.10:53",zone="."} 1
coredns_plugin_enabled{name="log",server="dns://172.21.0.10:53",zone="cluster.local."} 1
coredns_plugin_enabled{name="log",server="dns://172.21.0.10:53",zone="in-addr.arpa."} 1
coredns_plugin_enabled{name="log",server="dns://172.21.0.10:53",zone="ip6.arpa."} 1
coredns_plugin_enabled{name="loop",server="dns://169.254.20.10:53",zone="."} 1
coredns_plugin_enabled{name="loop",server="dns://169.254.20.10:53",zone="cluster.local."} 1
coredns_plugin_enabled{name="loop",server="dns://169.254.20.10:53",zone="in-addr.arpa."} 1
coredns_plugin_enabled{name="loop",server="dns://169.254.20.10:53",zone="ip6.arpa."} 1
coredns_plugin_enabled{name="loop",server="dns://172.21.0.10:53",zone="."} 1
coredns_plugin_enabled{name="loop",server="dns://172.21.0.10:53",zone="cluster.local."} 1
coredns_plugin_enabled{name="loop",server="dns://172.21.0.10:53",zone="in-addr.arpa."} 1
coredns_plugin_enabled{name="loop",server="dns://172.21.0.10:53",zone="ip6.arpa."} 1
coredns_plugin_enabled{name="prometheus",server="dns://169.254.20.10:53",zone="."} 1
coredns_plugin_enabled{name="prometheus",server="dns://169.254.20.10:53",zone="cluster.local."} 1
coredns_plugin_enabled{name="prometheus",server="dns://169.254.20.10:53",zone="in-addr.arpa."} 1
coredns_plugin_enabled{name="prometheus",server="dns://169.254.20.10:53",zone="ip6.arpa."} 1
coredns_plugin_enabled{name="prometheus",server="dns://172.21.0.10:53",zone="."} 1
coredns_plugin_enabled{name="prometheus",server="dns://172.21.0.10:53",zone="cluster.local."} 1
coredns_plugin_enabled{name="prometheus",server="dns://172.21.0.10:53",zone="in-addr.arpa."} 1
coredns_plugin_enabled{name="prometheus",server="dns://172.21.0.10:53",zone="ip6.arpa."} 1
# HELP coredns_reload_failed_total Counter of the number of failed reload attempts.
# TYPE coredns_reload_failed_total counter
coredns_reload_failed_total 0
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 2.5841e-05
go_gc_duration_seconds{quantile="0.25"} 8.2532e-05
go_gc_duration_seconds{quantile="0.5"} 0.000124131
go_gc_duration_seconds{quantile="0.75"} 0.000154354
go_gc_duration_seconds{quantile="1"} 0.000278483
go_gc_duration_seconds_sum 0.003190547
go_gc_duration_seconds_count 25
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 45
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.16.10"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 8.073456e+06
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 5.320392e+07
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.457767e+06
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 302576
# HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.
# TYPE go_memstats_gc_cpu_fraction gauge
go_memstats_gc_cpu_fraction 1.504496677254594e-05
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 5.355248e+06
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 8.073456e+06
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 5.6008704e+07
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 1.0051584e+07
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 36502
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 5.246976e+07
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 6.6060288e+07
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 1.6427311328710146e+09
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 0
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 339078
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 4800
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 16384
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 123216
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 147456
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 1.2238992e+07
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 708273
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 1.048576e+06
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 1.048576e+06
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 7.4793992e+07
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 11
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 2.5
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1.048576e+06
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 20
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 3.8047744e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.64273003965e+09
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 7.54192384e+08
# HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.
# TYPE process_virtual_memory_max_bytes gauge
process_virtual_memory_max_bytes 1.8446744073709552e+19
prameshj commented 2 years ago

Thanks for sharing this. However this does not include the "setup_errors_total" metric. This metric is exposed on 9353 port. The other coredns metrics from prometheus plugin are exposed on 9253.

By any chance, would you be able to export these metrics to a dashboard, so we can see the values as a function of time?

However, the logs don't have an entry like "Failed deleting iptables rule" - so it does not look like an iptables lock error :(

rtheis commented 2 years ago

@prameshj I'll fix our error collection to get metrics on port 9353.

rtheis commented 2 years ago

Here's recreate data for NodeLocal DNS version 1.21.3 on Kubernetes version 1.22:

Logs:

[INFO] SIGTERM: Shutting down servers then terminating

Metrics:

# HELP coredns_nodecache_setup_errors_total The number of errors during periodic network setup for node-cache
# TYPE coredns_nodecache_setup_errors_total counter
coredns_nodecache_setup_errors_total{errortype="configmap"} 0
coredns_nodecache_setup_errors_total{errortype="interface_add"} 0
coredns_nodecache_setup_errors_total{errortype="interface_check"} 0
coredns_nodecache_setup_errors_total{errortype="iptables"} 0
coredns_nodecache_setup_errors_total{errortype="iptables_lock"} 0
k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

rtheis commented 2 years ago

/remove-lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

rtheis commented 2 years ago

/remove-lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

rtheis commented 1 year ago

/remove-lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

rtheis commented 1 year ago

/remove-lifecycle stale

dpasiukevich commented 1 year ago

Apologies for not taking a look, I will try to see what's going on within the week.

rtheis commented 1 year ago

Apologies for not taking a look, I will try to see what's going on within the week.

Thank you. The problem continues but is hard to recreate. If there is any debug data that you'd like me to collect when we have a recreate, please let me know.

willzhang commented 1 year ago

what happend

kubernetes v1.25.6 with nodelocaldns 1.21.1 same problems

root@node1:~# kubectl get pods -A
NAMESPACE          NAME                                       READY   STATUS    RESTARTS      AGE
calico-apiserver   calico-apiserver-854448c89-brj8f           1/1     Running   0             6m46s
calico-apiserver   calico-apiserver-854448c89-j4vpv           1/1     Running   0             6m46s
calico-system      calico-kube-controllers-7bb667cfc6-vzbp4   1/1     Running   0             7m5s
calico-system      calico-node-q47hb                          1/1     Running   0             7m5s
calico-system      calico-typha-7d6d59f8f-55p5c               1/1     Running   0             7m5s
kube-system        coredns-5d5d4f8c5b-tzxbr                   1/1     Running   0             6m6s
kube-system        dns-autoscaler-7cfb7f9f95-7qk5f            1/1     Running   0             6m2s
kube-system        etcd-node1                                 1/1     Running   0             7m55s
kube-system        kube-apiserver-node1                       1/1     Running   0             8m
kube-system        kube-controller-manager-node1              1/1     Running   1             7m54s
kube-system        kube-scheduler-node1                       1/1     Running   1             7m54s
kube-system        nodelocaldns-g8r8r                         0/1     Error     2 (32s ago)   35s
tigera-operator    tigera-operator-6bb5669f85-665wb           1/1     Running   0             7m9s
root@node1:~# kubectl -n kube-system logs -f nodelocaldns-g8r8r 
2023/02/11 13:42:00 [INFO] Starting node-cache image: 1.21.1
2023/02/11 13:42:00 [INFO] Using Corefile /etc/coredns/Corefile
2023/02/11 13:42:00 [INFO] Using Pidfile 
2023/02/11 13:42:00 [ERROR] Failed to read node-cache coreFile /etc/coredns/Corefile.base - open /etc/coredns/Corefile.base: no such file or directory
2023/02/11 13:42:00 [INFO] Skipping kube-dns configmap sync as no directory was specified
.:53 on 169.254.25.10
cluster.local.:53 on 169.254.25.10
in-addr.arpa.:53 on 169.254.25.10
ip6.arpa.:53 on 169.254.25.10
[INFO] plugin/reload: Running configuration MD5 = adf97d6b4504ff12113ebb35f0c6413e
CoreDNS-1.7.0
linux/amd64, go1.16.8, 
[ERROR] plugin/errors: 2 4862263584182278023.3826447364325757841. HINFO: read udp 169.254.25.10:47232->169.254.25.10:53: i/o timeout
[ERROR] plugin/errors: 2 2408815618409886486.7089417576541627048.in-addr.arpa. HINFO: read tcp 192.168.72.31:46502->10.233.102.132:53: i/o timeout
[ERROR] plugin/errors: 2 2785916534782284493.260523573863106281.ip6.arpa. HINFO: read tcp 192.168.72.31:46500->10.233.102.132:53: i/o timeout
[FATAL] plugin/loop: Loop (169.254.25.10:47502 -> 169.254.25.10:53) detected for zone ".", see https://coredns.io/plugins/loop#troubleshooting. Query: "HINFO 4862263584182278023.3826447364325757841."
root@node1:~# 

what i do

use kubespray v2.21.0 install kubernetes v1.25.6.

1、when i run kubespray install i config the wrong kubelet flag at first time, so kubeadm init failed and i interrupt the init

2、 I solved the kubelet falg problem. then because ansible is idempotent, I continued to implement cluster deployment. Finally, I succeeded, but the nodelocaldns pod failed to start

meetmatt commented 1 year ago

Hello, I'm experiencing somewhat related issue with dns-node-cache, leading to endless crashloop backoff.

[INFO] Using Corefile /etc/coredns/Corefile [ERROR] Failed to read node-cache coreFile /etc/coredns/Corefile.base - open /etc/coredns/Corefile.base: no such file or directory [ERROR] Failed to sync kube-dns config directory /etc/kube-dns, err: lstat /etc/kube-dns: no such file or directory [ERROR] Failed to add non-existent interface nodelocaldns: operation not supported [INFO] Added interface - nodelocaldns [ERROR] Error checking dummy device nodelocaldns - operation not supported listen tcp 169.254.25.10:9254: bind: cannot assign requested address

Installing k8s via KubeKey

isugimpy commented 1 year ago

Just popping in here to say that I'm experiencing the same behavior as OP on Kubernetes v1.24.9 with node-local-dns v1.17.4. What I discovered earlier is that the SIGTERM at pod termination hangs and a SIGKILL happens. When the new pod starts up, all DNS traffic to it seems to fail. I haven't been able to validate this on a live node, this is based on forensics via logs and metrics, so I don't know if connections are simply timing out, or being refused, or if they manage to connect to the node-local-dns service and it's unable to make outbound calls to resolve things. I've been seeing this happen periodically but didn't pin down this being the issue until today. If I get a repro I can try to provide more data.

Logs just before the old pod dies:

May 2 01:42:13 node-local-dns-7drqn node-cache INFO [INFO] SIGTERM: Shutting down servers then terminating
May 2 01:42:13 node-local-dns-7drqn node-cache INFO [INFO] Tearing down
May 2 01:42:13 node-local-dns-7drqn node-cache WARNING [WARNING] Exiting iptables/interface check goroutine
May 2 01:42:21 node-local-dns-7drqn node-cache ERROR [ERROR] Untrapped signal, tearing down
May 2 01:42:21 node-local-dns-7drqn node-cache INFO [INFO] Tearing down

At startup of the new pod, I do see the log entry for adding the nodelocaldns interface, and the iptables rules, but nothing further happens from that point. Traffic to the pod's metrics port does work, and I was able to get metrics from it just fine, it just reported it never received another DNS request.

isugimpy commented 1 year ago

An update here:

I managed to repro this today. There's definitely something unusual going on. What's happening is that at startup of the replacement pod, the iptables rules never get added. I see in the logs where it claims to add them via the Added back nodelocaldns rule entries. All the messages that I'd expect to be present there are in the logs. But if I do an iptables-save, those rules aren't present in the output. Deleting the pod and letting it be recreated never fixes this either. Something about this is permanently poisoning the machine such that the node-cache binary thinks it's adding rules and they never make it in. I have gone to a healthy, working node and grabbed the relevant rules from it to make sure it's not something on the iptables side failing, and when I insert them into the chain they do insert successfully, so there has to be some kind of bug in the insertion process in the node-cache binary.

dpasiukevich commented 1 year ago

Nice find!

The nodelocaldns uses "k8s.io/kubernetes/pkg/util/iptables" to manage iptables rules.

node-local-dns v1.17.4 is somewhat old and it uses k8s.io/kubernetes v0.0.0-00010101000000-000000000000 The latest node-local-dns images (e.g. >= 1.22.19) use: k8s.io/kubernetes v1.24.10

@isugimpy could you try with the nodelocaldns 1.22.20 to see if the newer iptables client would work correctly?

As for the nodelocaldns iptables usage it's trivial and seems correct to me: source

  1. try to insert rule
  2. log if it exists already
  3. log info if no error
  4. log error if error.
agilanbtdw commented 1 year ago

Fixed this issue with these steps,

  1. Edit resolvConf value from /run/systemd/resolve/resolv.conf to /etc/resolv.conf in the kubelet-config.yaml file on every node throwing this nodelocaldns pod error. Before this change coredns took local/loopback addresses as upstream servers from /run/systemd/resolve/resolv.conf file. I just changed it to my /etc/resolv.conf file containing 8.8.8.8 and 8.8.4.4 as nameservers. More on this here.
  2. sudo systemctl restart kubelet
  3. kubectl delete pod <nodelocaldns pod name that is not running> -n kube-system
  4. Upon pod recreation, it was running. Some stubborn nodes ran after a reboot.
k8s-triage-robot commented 7 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

rtheis commented 7 months ago

/remove-lifecycle stale

k8s-triage-robot commented 4 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

rtheis commented 4 months ago

/remove-lifecycle stale

k8s-triage-robot commented 1 month ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

rtheis commented 1 month ago

/remove-lifecycle stale

yahalomimaor commented 3 weeks ago

Hi I'm facing the same issue as well Getting "ip table lock issue" then the pod is not responding, I'm using the latest version 1.22.28 Can you please advise who it happens, and how can it be mitigated?

rtheis commented 2 weeks ago

@yahalomimaor in the past, I was able to fix the problem by recreating the NodeLocal DNS pod on the node encountering the problem.