[prometheus-kube-stack] Target Kubelet 0/0 up and others are down

littlechicks commented 4 years ago

Hello,

I have deployed kube-prometheus stack over Kubernetes with helm (https://github.com/helm/charts/blob/master/stable/prometheus-operator/values.yaml)

Almost things are working except:

Targets kubelet are not configured : it's showing 0/0 up
Targets kube-prometheus-stack-kube-controller-manager are down
Targets kube-prometheus-stack-kube-etcd are down
Targets kube-prometheus-stack-kube-proxy are down
Targets kube-prometheus-stack-kube-scheduler are down

For the last four error it seem like prometheus is using the node IP instead of ClusterIP. The service for this four entries are correctly created but there is no Cluster IP assigned (see below)

kube-dns                                        ClusterIP   10.96.0.10       <none>        53/UDP,53/TCP,9153/TCP         5d22h
kube-prometheus-stack-coredns                   ClusterIP   None             <none>        9153/TCP                       13m
kube-prometheus-stack-kube-controller-manager   ClusterIP   None             <none>        10252/TCP                      13m
kube-prometheus-stack-kube-etcd                 ClusterIP   None             <none>        2379/TCP                       13m
kube-prometheus-stack-kube-proxy                ClusterIP   None             <none>        10249/TCP                      13m
kube-prometheus-stack-kube-scheduler            ClusterIP   None             <none>        10251/TCP                      13m
kube-prometheus-stack-kubelet                   ClusterIP   10.110.99.227    <none>        10250/TCP,10255/TCP,4194/TCP   3d23h
metrics-server                                  ClusterIP   10.100.147.217   <none>        443/TCP                        5d20h
tiller-deploy                                   ClusterIP   10.100.200.176   <none>        44134/TCP                      5d22h

Although the values.ymlare correctly set up for using Kubernetes Services as below:

kubeControllerManager:
  enabled: true
  service:
    port: 10252
    targetPort: 10252
    selector:
      component: kube-controller-manager
  serviceMonitor:
    interval: ""
    https: false
    insecureSkipVerify: null
    serverName: null
    metricRelabelings: []
    relabelings: []

kubeEtcd:
  enabled: true
  service:
    port: 2379
    targetPort: 2379
    selector:
      component: etcd
  serviceMonitor:
    interval: ""
    scheme: http
    insecureSkipVerify: false
    serverName: ""
    caFile: ""
    certFile: ""
    keyFile: ""
    metricRelabelings: []
    relabelings: []

kubeScheduler:
  enabled: true
  service:
    port: 10251
    targetPort: 10251
    selector:
      component: kube-scheduler
  serviceMonitor:
    interval: ""
    https: false
    insecureSkipVerify: null
    serverName: null
    metricRelabelings: []
    relabelings: []

kubeProxy:
  enabled: true
  service:
    port: 10249
    targetPort: 10249
    selector:
      k8s-app: kube-proxy
  serviceMonitor:
    interval: ""
    https: false
    metricRelabelings: []
    relabelings: []

The pod Labels are correctly matching:

ansible@master1:~$ kubectl get pods --show-labels -n=kube-system
NAME                              READY   STATUS    RESTARTS   AGE     LABELS
coredns-f9fd979d6-mk56n           1/1     Running   0          5d22h   k8s-app=kube-dns,pod-template-hash=f9fd979d6
coredns-f9fd979d6-z5tcm           1/1     Running   0          5d22h   k8s-app=kube-dns,pod-template-hash=f9fd979d6
etcd-master1                      1/1     Running   0          5d22h   component=etcd,tier=control-plane
etcd-master2                      1/1     Running   0          5d22h   component=etcd,tier=control-plane
etcd-master3                      1/1     Running   0          5d22h   component=etcd,tier=control-plane
kube-apiserver-master1            1/1     Running   0          5d22h   component=kube-apiserver,tier=control-plane
kube-apiserver-master2            1/1     Running   0          5d22h   component=kube-apiserver,tier=control-plane
kube-apiserver-master3            1/1     Running   0          5d22h   component=kube-apiserver,tier=control-plane
kube-controller-manager-master1   1/1     Running   1          5d22h   component=kube-controller-manager,tier=control-plane
kube-controller-manager-master2   1/1     Running   0          5d22h   component=kube-controller-manager,tier=control-plane
kube-controller-manager-master3   1/1     Running   0          5d22h   component=kube-controller-manager,tier=control-plane
kube-flannel-ds-2ldz8             1/1     Running   0          5d22h   app=flannel,controller-revision-hash=787f4dfddc,pod-template-generation=1,tier=node
kube-flannel-ds-6rgzp             1/1     Running   1          5d22h   app=flannel,controller-revision-hash=787f4dfddc,pod-template-generation=1,tier=node
kube-flannel-ds-8xv2f             1/1     Running   0          5d22h   app=flannel,controller-revision-hash=787f4dfddc,pod-template-generation=1,tier=node
kube-flannel-ds-b7mhz             1/1     Running   0          5d22h   app=flannel,controller-revision-hash=787f4dfddc,pod-template-generation=1,tier=node
kube-flannel-ds-lsvrg             1/1     Running   0          5d22h   app=flannel,controller-revision-hash=787f4dfddc,pod-template-generation=1,tier=node
kube-flannel-ds-xnlgl             1/1     Running   1          5d22h   app=flannel,controller-revision-hash=787f4dfddc,pod-template-generation=1,tier=node
kube-proxy-bkp7b                  1/1     Running   0          5d22h   controller-revision-hash=744c595cb,k8s-app=kube-proxy,pod-template-generation=1
kube-proxy-drzmv                  1/1     Running   0          5d22h   controller-revision-hash=744c595cb,k8s-app=kube-proxy,pod-template-generation=1
kube-proxy-gd9bk                  1/1     Running   0          5d22h   controller-revision-hash=744c595cb,k8s-app=kube-proxy,pod-template-generation=1
kube-proxy-kscx7                  1/1     Running   0          5d22h   controller-revision-hash=744c595cb,k8s-app=kube-proxy,pod-template-generation=1
kube-proxy-x6qwl                  1/1     Running   0          5d22h   controller-revision-hash=744c595cb,k8s-app=kube-proxy,pod-template-generation=1
kube-proxy-xjrtt                  1/1     Running   0          5d22h   controller-revision-hash=744c595cb,k8s-app=kube-proxy,pod-template-generation=1
kube-scheduler-master1            1/1     Running   1          5d22h   component=kube-scheduler,tier=control-plane
kube-scheduler-master2            1/1     Running   1          5d22h   component=kube-scheduler,tier=control-plane
kube-scheduler-master3            1/1     Running   0          5d22h   component=kube-scheduler,tier=control-plane
metrics-server-554df5cbcf-t5j5c   1/1     Running   0          4d23h   k8s-app=metrics-server,pod-template-hash=554df5cbcf
tiller-deploy-565984b594-22svl    1/1     Running   0          5d22h   app=helm,name=tiller,pod-template-hash=565984b594

I have no idea of why Prometheus want scrapping on the node IP instead of ClusterIP Services.... And for Kubelet Targets it's look like there is no service for kubelet ? Am I wrong ?

Thanks...

Workarrounds

1/ For the proxy down status For the Proxy I've solved it by updating configmap:

$ kubectl edit cm/kube-proxy -n kube-system

...
kind: KubeProxyConfiguration
metricsBindAddress: 0.0.0.0:10249
...

$ kubectl delete pod -l k8s-app=kube-proxy -n kube-system

2/ For the scheduler status

Moreover for the making metrics available to prometheus i have to edit /etc/kubernetes/manifests/ files by changing binding address to 0.0.0.0 and comment the --port:0.

But that's no a good thing because Scheduler is now exposed outside cluster on non Secure port.

So again, how can I achieve to make the servicemonitor working on ClusterIP like others wokring targets.... ?

NB : kubectl top pods is not working. Only kubectl tops nodes is working...

Configuration:

CNI : Flannel
KubeVersion : 1.19
Prometheus : v2.18.2
Prometheus-operator: v0.38.1

Nascire commented 4 years ago

I´ve had quite a similar problem - from what I can tell, it has to do with die Kubernetes Version, as the insecure (http) ports are deprecated/removed.

After some fiddling, I got to the point, where kube-proxy is empty (replaced by Cilium, so it´s ok) and etcd doesn´t work (tries to access it with internal IP, but is configured with public ip).

For kube-prometheus-operator, I used the following: ` kubeControllerManager: service: port: 10257 # https port targetPort: 10257 serviceMonitor: https: true # use https insecureSkipVerify: true # accept self-signed certificate serverName: 127.0.0.1 # do not search for called ip, but 127.0.0.1 as CN in certificate

kubeScheduler: service: port: 10259 targetPort: 10259 serviceMonitor: https: true insecureSkipVerify: true serverName: 127.0.0.1

prometheusOperator: hostNetwork: true `

For cilium: ` global: devices:

eth0 # public
eth1 # internal kubeProxyReplacement: strict k8sServiceHost: k8s.domain.tld k8sServicePort: 6443 ipMasqAgent: enabled: true

etcd: enabled: false managed: false

ipam: operator: clusterPoolIPv4PodCIDR: "{{ pod_cidr }}" `

Additionally, I changed the binding for kube-controller-manager and kube-scheduler to 0.0.0.0 (works for me because of a firewall) via kubeadm --config in init-phase: apiVersion: kubeadm.k8s.io/v1beta2 kind: ClusterConfiguration kubernetesVersion: v1.19.0 controllerManager: extraArgs: bind-address: 0.0.0.0 scheduler: extraArgs: address: 0.0.0.0

marcus-sa commented 3 years ago

Even when disabling control plane metrics the prometheus pod is still not being created.

rpf3 commented 3 years ago

I'm facing similar issues when deployed in Amazon EKS and managed to resolve the kube-proxy issue by editing the configmap as described above. Not ideal but I think for now I'm going to disable these monitors.

alekc commented 3 years ago

A bit of context to the secure and unsecure ports

- Kubeadm: enable the usage of the secure kube-scheduler and kube-controller-manager ports for health checks. For kube-scheduler was 10251, becomes 10259. For kube-controller-manager was 10252, becomes 10257. ([#85043](https://github.com/kubernetes/kubernetes/pull/85043), [@neolit123](https://github.com/neolit123))

also this a bit better formatted config

        kubeControllerManager:
          service:
            port: 10257
            targetPort: 10257
          serviceMonitor:
            https: "true"
            insecureSkipVerify: "true"
        kubeScheduler:
          service:
            targetPort: 10259
            targetPort: 10259
          serviceMonitor:
            https: "true"
            insecureSkipVerify: "true"

littlechicks commented 3 years ago

Thanks for advices.

And what about my problem with kubectl top pods. Only nodes metrics are showing up? And what about the problem with kubelet ? 0/0 showing up.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

mscbpi commented 3 years ago

Exactly Same problem. Kubeadm launched vanilla cluster 1.20.1.

From what I can see my static pods (etcd, kube-scheduler, kube-controller-manager) hold public IP (Node IP), so that's the endpoint to be reached by the according services trying to scrape.

However the service itself is bound to 127.0.0.1 so not accessible from outside, one would have to change the bind-address but this looks unsecure to me.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

foltik commented 3 years ago

For etcd (and possibly others too), the regular client listen address which is bound to the node IP does expose the /metrics endpoint, but requires authentication. I was able to successfully scrape metrics from etcd by creating a secret with the certs:

kubectl -n monitoring create secret generic etcd-client-cert --from-file=/etc/kubernetes/pki/etcd/ca.crt --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.crt --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.key

And configuring the helm chart to mount and use them:

prometheus:
  prometheusSpec:
    secrets: ['etcd-client-cert']

kubeEtcd:
  serviceMonitor:
   scheme: https
   insecureSkipVerify: false
   serverName: localhost
   caFile: /etc/prometheus/secrets/etcd-client-cert/ca.crt
   certFile: /etc/prometheus/secrets/etcd-client-cert/healthcheck-client.crt
   keyFile: /etc/prometheus/secrets/etcd-client-cert/healthcheck-client.key

worawutchan commented 3 years ago

Thanks a lot now scrape etcd metrics is fine. however testing the issue with other metrics.

gauravmeena83 commented 3 years ago

Thanks a lot now scrape etcd metrics is fine. however testing the issue with other metrics.

have you been able to do it with scheduler, controller and proxy???

gauravmeena83 commented 3 years ago

For kube-coltroller-manager and kube-scheduler follow https://stackoverflow.com/questions/65901186/kube-prometheus-stack-issue-scraping-metrics/66276144#66276144

For kube-proxy https://stackoverflow.com/questions/60734799/all-kubernetes-proxy-targets-down-prometheus-operator

For ETCD, this here by Foltik, https://github.com/prometheus-community/helm-charts/issues/204#issuecomment-765155883

For kubelet, depends whether it is running in container or as a process. In my case it was process (k8s on prem using kubeadm), hence it was picked automatically.

sektorhybrid commented 3 years ago

I also used the workarounds and proxy / etcd / controller / scheduler are working as expected, on the other hand it still seems less secure in cases, ClusterIPs would seem a good solution in case I am not missing something. +1 for this to be investigated further.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale[bot] commented 3 years ago

This issue is being automatically closed due to inactivity.

banna2019 commented 2 years ago

For etcd (and possibly others too), the regular client listen address which is bound to the node IP does expose the /metrics endpoint, but requires authentication. I was able to successfully scrape metrics from etcd by creating a secret with the certs:
kubectl -n monitoring create secret generic etcd-client-cert --from-file=/etc/kubernetes/pki/etcd/ca.crt --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.crt --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.key
And configuring the helm chart to mount and use them:
prometheus:
  prometheusSpec:
    secrets: ['etcd-client-cert']

kubeEtcd:
  serviceMonitor:
   scheme: https
   insecureSkipVerify: false
   serverName: localhost
   caFile: /etc/prometheus/secrets/etcd-client-cert/ca.crt
   certFile: /etc/prometheus/secrets/etcd-client-cert/healthcheck-client.crt
   keyFile: /etc/prometheus/secrets/etcd-client-cert/healthcheck-client.key
Hello, I am a cluster deployed by kubeadm. Refer to this configuration, but it has no effect

samcday commented 2 years ago

For etcd (and possibly others too), the regular client listen address which is bound to the node IP does expose the /metrics endpoint, but requires authentication. I was able to successfully scrape metrics from etcd by creating a secret with the certs:
kubectl -n monitoring create secret generic etcd-client-cert --from-file=/etc/kubernetes/pki/etcd/ca.crt --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.crt --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.key
And configuring the helm chart to mount and use them:
prometheus:
  prometheusSpec:
    secrets: ['etcd-client-cert']

kubeEtcd:
  serviceMonitor:
   scheme: https
   insecureSkipVerify: false
   serverName: localhost
   caFile: /etc/prometheus/secrets/etcd-client-cert/ca.crt
   certFile: /etc/prometheus/secrets/etcd-client-cert/healthcheck-client.crt
   keyFile: /etc/prometheus/secrets/etcd-client-cert/healthcheck-client.key

If you're using kubeadm, it has already configured etcd with --listen-metrics-urls, which does not require certificates, and is just plain HTTP.

... Unfortunately by default it's probably listening on 127.0.0.1:2381. To remedy that you need to ensure your ClusterConfiguration includes something like this:

kind: ClusterConfiguration
etcd:
  local:
    extraArgs:
      listen-metrics-urls: http://0.0.0.0:2381

If you've already provisioned your cluster, you'll need to monkey-patch that in with a kubectl edit -n kube-system cm/kubeadm-config, and then run kubeadm upgrade node on each control plane node that is hosting etcd.

Then you need to update your kube-prometheus-stack Helm values to include this:

kubeEtcd:
  service:
    port: 2381
    targetPort: 2381

MohammedNoureldin commented 8 months ago

Thank you, @samcday

I am curious to know if it is safe in production to make my metrics listen endpoint to listen to 0.0.0.0? Or is there any production-level approach?

samcday commented 8 months ago

@MohammedNoureldin that's a tricky question to answer without more understanding of your production environment.

If you open up the metrics endpoint on machines that are directly connected to the public internet, then this is certainly a potential security risk. Metrics shouldn't be exposing anything particularly sensitive, nor should it be vulnerable to any exploits, but of course every additional system/service/code-path you expose to the cold, brutal wasteland that we call "the Internet" is inherently dangerous.

If your machines are connected together on a private network, you can firewall access to to the metrics endpoint such that ingress is only permitted from 192.168.0.0/24 (for example). In this case you could also configure each node with a listen address of that Node's private IP, but then you'd also need to configure kubeadm uniquely on each machine.

MohammedNoureldin commented 8 months ago

@samcday the approach you described is what I have been using.

Another approach I tried:

I didn't succeed to configure metrics endpoints to bind to interfaces of specific network. Is it at all possible? I mean to bind to something like "10.0.0.0/16", which will bind to all interfaces with IP address that satisfies this CIDR. Should such syntax work with binding AFAYK?

samcday commented 8 months ago

Mmmh, no I don't think you can bind to a subnet like that. Looking through etcd code, the metrics listen config option ends up as a regular net.Listen call: https://github.com/etcd-io/etcd/blob/2e7ed80be246380fc50ca46aa193348a28935380/client/pkg/transport/listener.go#L109C17-L109C23

You can specify an IP address, or a hostname (which will be resolved to an IP, and is not recommended).

I'm not 100% sure about this but ultimately, at least on Linux, a socket binding can only be performed on a specific address, and that address must be known to the kernel ahead of time.

Think about it this way, if you have two machines available on 10.0.0.1 and 10.0.0.2, the kernel needs to know where to send packets. If you were somehow able to bind to 10.0.0.0/16, then how would the kernel know that packets destined for 10.0.0.2 from 10.0.0.1 should leave the machine on an interface, rather than delivered to the local process? :)

samcday commented 8 months ago

the approach you described is what I have been using.

If you're saying that you have a firewall in place that does not permit ingress to the metrics port except from internal networks, then that should already be sufficient and you can continue binding to 0.0.0.0 reasonably safely. If you want to be maximum paranoid, you can consider signing up for a service like Shodan and provide it with your public IP addresses. It will continually port scan those addresses and send you security notifications if new ports become publicly reachable.

MohammedNoureldin commented 8 months ago

@samcday thanks a lot.

If you're saying that you have a firewall in place that does not permit ingress to the metrics port except from internal networks, then that should already be sufficient and you can continue binding to 0.0.0.0 reasonably safely

Yes, I meant binding to 0.0.0.0 but preventing everything except a specific internal network using a firewall.

mvtab commented 2 months ago

This issue is closed, but the issue still remains: kube-stack not being able to monitor the main components of the cluster, and the only workaround is exposing your main cluster components to 0.0.0.0 like a savage. You can say what you want, if you don't want problems and/or accidents you don't expose services on 0.0.0.0. More so when we're talking about core components of a Kubernetes cluster.

So my question is, is this being fixed? Is it possible to be fixed? Because I see there were commits to blame this on the users, as in "Certain cluster configurations can cause.. ", while this is the symptomatic if you install a vanilla kubeadm cluster and a vanilla kube-prometheus-stack (absolute default behavior).

samcday commented 2 months ago

So my question is, is this being fixed? Is it possible to be fixed?

How would you propose something like this be fixed? As you've pointed out, the vanilla/default kubeadm behaviour is to configure control-plane components as hostNetwork services that opt for a secure default of binding to 127.0.0.1 only. This is very reasonable/desirable, because kubeadm cannot (or well, not without a lot of effort for little gain) know the network conditions the control-plane is to be installed into.

The issue doesn't really lie with Prometheus/k-p-s, and not even really with kubeadm or the k8s control-plane components. Rather, this is just one of those unfortunate collisions of different competing concerns. There isn't really a "fix" beyond improving documentation.

mvtab commented 2 months ago

Well as you already said, the issue was caused by kubeadm implementing a security feature. Now, I know this puts prometheus in a difficult position, but there are elegant fixes/workarounds. For example, this one. Notice his last remark:

It may be possible to deploy this proxy server as an option for kube-prometheus-stack.

This is not an issue to ignore. Right now, we have people globally setting their bind addresses of their main Kubernetes components to 0.0.0.0, so practically the security feature of kubeadm is for many a decrease in security, because of the workaround they are implementing. This means it's already too late, the wrong fix is being propagated as we speak to thousands of clusters, or even more. I would go so far to say this is a security vulnerability of the kube-prometheus-stack, because keep in mind most of them are implementing insecureSkipVerify TLS among the bind address, which makes everything so much worse.

Whatever the workaround/fix, it should to be implemented fast or the affected features should be removed, before even more clusters get "infected" with the workaround.

samcday commented 2 months ago

For example, https://github.com/prometheus-community/helm-charts/issues/1704#issuecomment-1100607982. Notice his last remark:

Yep, so this would still fall into my "improving documentation" bucket.

(Oh and I should take this moment to note that I'm not a contributor to any of the projects we've discussed thus far)

To expand and put it a bit differently: the defaults being the way they are, and the state of things being what it is, means that if you want to scrape metrics for your cluster's control-plane from inside of said cluster, you must deploy additional systems and services (which require additional decision-making for trade-offs in terms of maintainability, complexity, regulatory compliance concerns, etc) in some cases.

Since this is a very common occurence, the pathways forward ought to be documented from here, but only as a "helpful local". You're standing in a particular place and want to know where to go from here, k-ps-/Prometheus can't travel this journey with you, but k-p-s could/should point the way at least.

And again, to re-state, in many cases you don't need any extra complexities like a proxy. Example: your cluster nodes are running in an AWS VPC without a public NAT egress, and you know that the ingress VPCs/security-groups and workloads in your cluster are "trusted". Your kube-c-p components can listen on 0.0.0.0.

prometheus-community / helm-charts

[prometheus-kube-stack] Target Kubelet 0/0 up and others are down #204