kubernetes-sigs / prometheus-adapter

An implementation of the custom.metrics.k8s.io API using Prometheus
Apache License 2.0
1.92k stars 554 forks source link

GKE Private Cluster "v1beta1.custom.metrics.k8s.io" Apiservice showing (FailedDiscoveryCheck) "net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"" #534

Closed searce-aditya closed 1 year ago

searce-aditya commented 2 years ago

I am working on GKE Private Cluster having private endpoint for Master control plane. I want to configure Hpa using Prometheus adapter. I have deployed Prometheus and Prometheus-adapter using helm referring to this doc: https://www.private-ai.com/2022/05/31/how-to-autoscale-kubernetes-pods-based-on-gpu/ . After deploying Prometheus adapter when I run command : "kubectl get apiservice" getting below output

v1beta1.custom.metrics.k8s.io li-ns/prometheus-adapter False (FailedDiscoveryCheck) 47m

I tried describing the Apiservice : "kubectl describe apiservice v1beta1.custom.metrics.k8s.io"

Output: Status: Conditions: Last Transition Time: 2022-10-17T17:37:58Z Message: failing or missing response from https://172.24.4.22:6443/apis/custom.metrics.k8s.io/v1beta1: Get "https://172.24.4.22:6443/apis/custom.metrics.k8s.io/v1beta1": context deadline exceeded Reason: FailedDiscoveryCheck Status: False Type: Available Events: <none>

Also tried running: " kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1"

Output: Error from server (ServiceUnavailable): the server is currently unable to handle the request

I referred to few similar cases: https://github.com/kubernetes-sigs/metrics-server/issues/131

And tried adding the following parameters in the metrics-server's deployment yaml file: command:

But as in GKE cluster the metric-server comes pre-configured and the metric-server pods are crashing after adding the above command parameters.

Please give your suggestions on how we can resolve this issue?

dante-saggin commented 2 years ago

I had the same issue but I changed it to use the port 10250 on gke (mostly because I didn´t have permissions to create any firewall rule) I did it based on the gcp document. https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters#add_firewall_rules

umutsahin commented 1 year ago

I have the same issue, again on GKE (v1.22.16-gke.2000)

When I increase log levels to 6 I am able to see below error:

I0111 15:52:36.566618       1 round_trippers.go:553] GET https://10.207.254.1:443/apis/custom.metrics.k8s.io/v1beta1?timeout=32s 503 Service Unavailable in 7 milliseconds
I0111 15:52:36.566715       1 request.go:1264] body was not decodable (unable to check for Status): couldn't get version/kind; json parse error: json: cannot unmarshal string into Go value of type struct { APIVersion string "json:\"apiVersion,omitempty\""; Kind string "json:\"kind,omitempty\"" }

When I deploy same charts with same values to OpenShift it works fine.

Any help is appreciated @dgrisonnet @olivierlemasle Thanks!

dgrisonnet commented 1 year ago

The log line means that the kube-apiserver couldn't reach prometheus-adapter. Have you perhaps checked if the solution mentioned above works for you?

umutsahin commented 1 year ago

On my first trial port 10250 was clashing with some other pod, redeployed and now it works, Thanks @dgrisonnet!

dgrisonnet commented 1 year ago

You are welcome.

For anyone stumbling upon this issue in the future, solution has been provided in https://github.com/kubernetes-sigs/prometheus-adapter/issues/534#issuecomment-1290762676.