Servicemonitor/monitor/coredns context deadline exceeded

danielSundsvallSCIT commented 2 years ago

I have been struggling with this issue a while now, and I need som guidance or tips how to proceed. Managed to scrape everything except the coredns pods.

All the pods is running

NAME                                  READY   STATUS    RESTARTS   AGE
alertmanager-main-0                   2/2     Running   0          29h
alertmanager-main-1                   2/2     Running   0          29h
alertmanager-main-2                   2/2     Running   0          29h
blackbox-exporter-5cb5d7479d-4z68g    3/3     Running   0          29h
grafana-d595885ff-ggbph               1/1     Running   0          29h
kube-state-metrics-79f478884f-v2b8w   3/3     Running   0          29h
node-exporter-cddwl                   2/2     Running   0          29h
node-exporter-flkfv                   2/2     Running   0          29h
node-exporter-kl5kc                   2/2     Running   0          29h
node-exporter-n8nmk                   2/2     Running   0          29h
node-exporter-v76n7                   2/2     Running   0          29h
node-exporter-wvfg2                   2/2     Running   0          29h
prometheus-adapter-7bf7ff5b67-bbnkj   1/1     Running   0          29h
prometheus-adapter-7bf7ff5b67-k7hlm   1/1     Running   0          29h
prometheus-k8s-0                      2/2     Running   0          29h
prometheus-k8s-1                      2/2     Running   0          29h
prometheus-operator-7684989c7-pzkc6   2/2     Running   0          29h

My coredns monitor and kube-dns svc

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  creationTimestamp: "2022-05-18T13:46:27Z"
  generation: 3
  labels:
    app.kubernetes.io/name: coredns
    app.kubernetes.io/part-of: kube-prometheus
  name: coredns
  namespace: monitoring
  resourceVersion: "2876655"
  uid: c1fb8571-c30f-47ce-b749-6d0cecf91f09
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    interval: 15s
    metricRelabelings:
    - action: drop
      regex: coredns_cache_misses_total
      sourceLabels:
      - __name__
    port: metrics
  jobLabel: app.kubernetes.io/name
  namespaceSelector:
    matchNames:
    - kube-system
  selector:
    matchLabels:
      k8s-app: kube-dns

[k8admin@masternode manifests]$ k -n kube-system get svc kube-dns -o yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/port: "9153"
    prometheus.io/scrape: "true"
  creationTimestamp: "2022-05-04T08:08:48Z"
  labels:
    k8s-app: kube-dns
    kubernetes.io/cluster-service: "true"
    kubernetes.io/name: CoreDNS
  name: kube-dns
  namespace: kube-system
  resourceVersion: "2872194"
  uid: 4b887e28-74e8-44fc-a0f0-4def19a3a1b6
spec:
  clusterIP: 10.197.132.10
  clusterIPs:
  - 10.197.132.10
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: dns
    port: 53
    protocol: UDP
    targetPort: 53
  - name: dns-tcp
    port: 53
    protocol: TCP
    targetPort: 53
  - name: metrics
    port: 9153
    protocol: TCP
    targetPort: 9153
  selector:
    k8s-app: kube-dns
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

Dont see anything in the pod logs referring to this issue, I even tried to set up a new servicemonitor and got the same result.

I think it may be a network issue inside the cluster, but I wanna double check if any one can see something that I dont. And maybe point me in right direction.

We are using Cisco ACI as CNI plugin.

slashpai commented 2 years ago

Have you checked prometheus and prometheus-operator logs?

afirth commented 2 years ago

possible that the dns servers are not serving metrics? either because the plugin is not enabled or because the pods do not have the port open?. check with kubectl port forward svc/kubedns metrics and curl localhost:9153/metrics (not tested)

rgarcia89 commented 2 years ago

I see the plugin enabled. Also the pods are serving metrics. However the default kubernetes coredns service is not exposing the metrics port.

.:53 {
    errors
    ready
    health
    kubernetes cluster.local in-addr.arpa ip6.arpa {
      pods insecure
      fallthrough in-addr.arpa ip6.arpa
    }
    prometheus :9153
    forward . /etc/resolv.conf
    cache 30
    loop
    reload
    loadbalance
    import custom/*.override
}

kdesc svc -n kube-system kube-dns
Name:              kube-dns
Namespace:         kube-system
Labels:            addonmanager.kubernetes.io/mode=Reconcile
                   k8s-app=kube-dns
                   kubernetes.io/cluster-service=true
                   kubernetes.io/name=CoreDNS
Annotations:       <none>
Selector:          k8s-app=kube-dns
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.0.0.10
IPs:               10.0.0.10
Port:              dns  53/UDP
TargetPort:        53/UDP
Endpoints:         10.244.6.5:53,10.244.7.37:53
Port:              dns-tcp  53/TCP
TargetPort:        53/TCP
Endpoints:         10.244.6.5:53,10.244.7.37:53
Session Affinity:  None
Events:            <none>

curl against a coredns pod

/ # curl 10.244.6.5:9153/metrics
# HELP coredns_build_info A metric with a constant '1' value labeled by version, revision, and goversion from which CoreDNS was built.
# TYPE coredns_build_info gauge
coredns_build_info{goversion="go1.17",revision="a9adfd56",version="1.8.7"} 1
# HELP coredns_cache_entries The number of elements in the cache.
# TYPE coredns_cache_entries gauge
coredns_cache_entries{server="dns://:53",type="denial"} 339
coredns_cache_entries{server="dns://:53",type="success"} 63
...

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had any activity in the last 60 days. Thank you for your contributions.

github-actions[bot] commented 1 year ago

This issue was closed because it has not had any activity in the last 120 days. Please reopen if you feel this is still valid.

ffzzhong commented 6 months ago

hi @danielSundsvallSCIT sorry a bit long time already, but have you figured out what's the issue? I'm seeing the same context deadline exceeded but all the ping/curl/port forwarding actually works

prometheus-operator / kube-prometheus

Servicemonitor/monitor/coredns context deadline exceeded #1762