kubernetes-sigs / prometheus-adapter

An implementation of the custom.metrics.k8s.io API using Prometheus
Apache License 2.0
1.9k stars 551 forks source link

Certificate Validation Error in Prometheus Adapter Despite Valid Certificate #591

Closed yashwanth-mannem closed 1 week ago

yashwanth-mannem commented 1 year ago

What happened?:

During communication between the API server and the Prometheus adapter, a certificate validation error was observed. The error indicated that the certificate was expired. However, upon checking, the certificate in use was found to be valid and not expired.

What did you expect to happen?:

The Prometheus adapter should have successfully authenticated with the Kubernetes API server using the valid certificate, and metrics retrieval should have been successful.

Please provide the prometheus-adapter config:

prometheus-adapter config apiVersion: v1 data: config.yaml: | resourceRules: cpu: containerLabel: container containerQuery: | sum by (<<.GroupBy>>) ( irate ( container_cpu_usage_seconds_total{<<.LabelMatchers>>,container!="",pod!=""}[120s] ) ) nodeQuery: | sum by (<<.GroupBy>>) ( 1 - irate( node_cpu_seconds_total{mode="idle"}[60s] ) ) or sum by (<<.GroupBy>>) ( node:windows_node_cpu_utilisation:avg5m{mode="idle",job="wmi-exporter",<<.LabelMatchers>>} ) resources: overrides: instance: resource: node namespace: resource: namespace pod: resource: pod memory: containerLabel: container containerQuery: | sum by (<<.GroupBy>>) ( container_memory_working_set_bytes{<<.LabelMatchers>>,container!="",pod!=""} ) nodeQuery: | sum by (<<.GroupBy>>) ( node_memory_MemTotal_bytes{job="node-exporter",<<.LabelMatchers>>} - node_memory_MemAvailable_bytes{job="node-exporter",<<.LabelMatchers>>} ) or sum by (<<.GroupBy>>) ( node:windows_node_memory_utilization{job="wmi-exporter",<<.LabelMatchers>>} ) resources: overrides: instance: resource: node namespace: resource: namespace pod: resource: pod window: 5m kind: ConfigMap metadata: annotations: helm.fluxcd.io/antecedent: prometheus:helmrelease/prometheus-adapter meta.helm.sh/release-name: prometheus-adapter meta.helm.sh/release-namespace: prometheus creationTimestamp: "2023-06-21T23:04:00Z" labels: app: prometheus-adapter app.kubernetes.io/managed-by: Helm chart: prometheus-adapter-2.6.2 heritage: Helm release: prometheus-adapter name: prometheus-adapter namespace: prometheus

Please provide the HPA resource used for autoscaling:

HPA yaml Not setup. We are noticing the issue, while executing k top nodes Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

Please provide the HPA status:

NA

Please provide the prometheus-adapter logs with -v=6 around the time the issue happened:

prometheus-adapter logs E0614 20:26:50.160879 1 authentication.go:53] Unable to authenticate the request due to an error: [x509: certificate has expired or is not yet valid: current time 2023-06-14T20:26:50Z is after 2022-09-20T21:38:27Z, verifying certificate SN=591688138426063623, SKID=, AKID= failed: x509: certificate has expired or is not yet valid: current time 2023-06-14T20:26:50Z is after 2022-09-20T21:38:27Z]

Anything else we need to know?:

We have ensured that the certificate is not expired and is located at the correct path as configured in the adapter. We have also verified the synchronization of the Secret/ConfigMap, certificate rotation process, time synchronization across nodes, and the validity of the certificate chain. The problem persists.

Environment: prometheus-adapter version: 2.6.2 prometheus version: v0.38.1 Kubernetes version (use kubectl version): v1.20.5" Cloud provider or hardware configuration: vpshere Other info: Verified allt he dependent manifests and resources; everything looks fine, but do not see metrics api service in the apiregistrations

dgrisonnet commented 1 year ago

/kind support /remove-kind bug /triage accepted /assign

k8s-triage-robot commented 2 months ago

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

dashpole commented 1 week ago

@yashwanth-mannem sorry no one was able to get to this issue. If you are still experiencing the problem, feel free to re-open. /close

k8s-ci-robot commented 1 week ago

@dashpole: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/prometheus-adapter/issues/591#issuecomment-2332226760): >@yashwanth-mannem >sorry no one was able to get to this issue. If you are still experiencing the problem, feel free to re-open. >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.