kubernetes-sigs / metrics-server

Scalable and efficient source of container resource metrics for Kubernetes built-in autoscaling pipelines.
https://kubernetes.io/docs/tasks/debug-application-cluster/resource-metrics-pipeline/
Apache License 2.0
5.72k stars 1.86k forks source link

403 Forbidden error faced by Prometheus due to the Helm chart's ServiceMonitor manifest #1449

Closed yashvardhan-kukreja closed 5 months ago

yashvardhan-kukreja commented 5 months ago

What happened:

What you expected to happen:

Anything else we need to know?:

The reason for this issue is that the ServiceMonitor manifest, which is a part of the current helm chart, doesn't configure the serviceAccount token (of the Prom pods) as the BearerToken to be used when scraping.

Due to this, Prom pods, despite having the token, don't use it when scraping the metrics-server and end up facing 403.

This has to be specified and fixed in ServiceMonitor manifest.

Details about the ServiceMonitor I faced this issue with

ServiceMonitor which I took from the Helm chart definition and applied

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: metrics-server-sm
  namespace: kube-system
  labels:
    release: kube-prometheus-stack-1711317473
spec:
  namespaceSelector:
    matchNames:
      - kube-system
  selector:
    matchLabels:
      k8s-app: metrics-server
  endpoints:
    - port: https
      scheme: https
      path: /metrics
      tlsConfig:
        insecureSkipVerify: true

Screenshot of the Prom pods' targets section showing the inaccessiblity to metrics-server when scraping

image

Environment:

spoiler for Metrics Server manifest:
spoiler for Kubelet config:
spoiler for Metrics Server logs:
spolier for Status of Metrics API: ```sh kubectl describe apiservice v1beta1.metrics.k8s.io ```

/kind bug

k8s-ci-robot commented 5 months ago

This issue is currently awaiting triage.

If metrics-server contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
yashvardhan-kukreja commented 5 months ago

/assign

yashvardhan-kukreja commented 5 months ago

Note: Apologies for creating the PR beforehand but I proceeded to do so anyway to keep the resolution of this issue easily viewable and accessible.

stevehipwell commented 5 months ago

@yashvardhan-kukreja I'm not seeing this issue on any of my clusters. Could you please fill out the issue template to provide your context?

stevehipwell commented 5 months ago

@yashvardhan-kukreja I think you will also need to provide your Prometheus and Prometheus Operator versions.

serathius commented 5 months ago

image

Prometheus should not scrape MS.

stevehipwell commented 5 months ago

@serathius I think in this context we're looking at collecting the MS metrics for itself (see https://github.com/kubernetes-sigs/metrics-server/issues/988 for context).

yashvardhan-kukreja commented 5 months ago

Thanks folks for your quick replies.

Actually @serathius @stevehipwell I wanted to ship the metrics exposed by metrics-server to the monitoring stack spun up by kube-state-metrics. (because I wanted the resource usage of metrics of pods and nodes to be visible to the corresponding Prometheus instance)

Thanks for pointing me to that "Caution" in docs but if that is the case, then would you mind me asking the reason behind exposing a ServiceMonitor in the first place under the helm chart if metrics-server is not meant to be scraped in the first place?

The whole point of this issue I raised was that the ServiceMonitor didn't seem to work at all because of the absence of the "bearerTokenFile" attribute which seemed to cause a 403 every time that ServiceMonitor is used.

yashvardhan-kukreja commented 5 months ago

@stevehipwell Sure, let me send you the exact steps I made which led me to this issue.

yashvardhan-kukreja commented 5 months ago

Folks, I think I found the reason behind me facing this issue while @stevehipwell didn't

image

I didn't notice the metrics.enabled's check and I went to manually apply the ServiceMonitor provided by helm charts by referring the upstream repo here without realising the existence of this line

So, TL;DR is I tried to install ServiceMonitor without setting metrics.enabled=true

I apologize for this silly mistake of mine and taking your time.

Meanwhile, would it be sensible to allow setting up the ServiceMonitor regardless of metrics.enabled being true/false? So, when metrics.enabled is set to false, even then the ServiceMonitor gets created yet with the mention of the bearerToken file.

Considering the fact that ServiceMonitors translate to only be used by the pods of Prom instance spun up by Prometheus operator and everytime those pods would have their identity in the form of the serviceAccount they use, this can be a potentially worthy feature to have.

If you think so, we can shift this issue from "bug" to "feature"

If you don't consider that, feel absolutely to close this issue.

Thanks for your patience and quick replies folks agains, Cheers!

stevehipwell commented 5 months ago

I wanted to ship the metrics exposed by metrics-server to the monitoring stack spun up by kube-state-metrics

@yashvardhan-kukreja do you not mean kube-prometheus-stack?

You should be able to scrape the kubelet metrics, and if you're using KPS this is trivial.

Meanwhile, would it be sensible to allow setting up the ServiceMonitor regardless of metrics.enabled being true/false? So, when metrics.enabled is set to false, even then the ServiceMonitor gets created yet with the mention of the bearerToken file.

Considering the fact that ServiceMonitors translate to only be used by the pods of Prom instance spun up by Prometheus operator and everytime those pods would have their identity in the form of the serviceAccount they use, this can be a potentially worthy feature to have.

I don't think there is any benefit to this change, it's documented and pretty idiomatic for a Helm chart. I also wouldn't agree that your assumption about the identity is (or will be) "always" correct.

yashvardhan-kukreja commented 5 months ago

do you not mean kube-prometheus-stack?

I meant that only and thanks for mentioning the fact that I can use kubelet metrics for my purposes. A good TIL for me.

I don't think there is any benefit to this change, it's documented and pretty idiomatic for a Helm chart. I also wouldn't agree that your assumption about the identity is (or will be) "always" correct.

Sure, understandable and fair enough. Closing this issue and the corresponding pull request in favor of this. Appreciate your time folks.

Have a great day and week ahead!