kubernetes-sigs / metrics-server

Scalable and efficient source of container resource metrics for Kubernetes built-in autoscaling pipelines.
https://kubernetes.io/docs/tasks/debug-application-cluster/resource-metrics-pipeline/
Apache License 2.0
5.63k stars 1.85k forks source link

Failed to scrape node: remote error: tls: internal error #1480

Open rarecrumb opened 2 months ago

rarecrumb commented 2 months ago

What happened: Metrics server failed to scrape a node

What you expected to happen: Successfully scrape the node

Anything else we need to know?: Deploying with the helm chart

Environment:

spoiler for Metrics Server manifest: ```yaml args: - --kubelet-insecure-tls containerPort: 4443 hostNetwork: enabled: true ```
spoiler for Kubelet config:
spoiler for Metrics Server logs: ``` E0501 16:40:35.362224 1 scraper.go:149] "Failed to scrape node" err="Get \"https://10.3.10.48:10250/metrics/resource\": remote error: tls: internal error" node="ip-10-3-10-48.ec2.internal" ```
spolier for Status of Metrics API: ```sh kubectl describe apiservice v1beta1.metrics.k8s.io ``` ``` Name: v1beta1.metrics.k8s.io Namespace: Labels: app.kubernetes.io/instance=metrics-server app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=metrics-server app.kubernetes.io/version=0.7.1 argocd.argoproj.io/instance=metrics-server helm.sh/chart=metrics-server-3.12.1 Annotations: API Version: apiregistration.k8s.io/v1 Kind: APIService Metadata: Creation Timestamp: 2023-07-13T20:41:49Z Resource Version: 266080474 UID: 59bdff53-5db0-4819-a27e-6aff8526d41e Spec: Group: metrics.k8s.io Group Priority Minimum: 100 Insecure Skip TLS Verify: true Service: Name: metrics-server Namespace: base Port: 443 Version: v1beta1 Version Priority: 100 Status: Conditions: Last Transition Time: 2024-04-30T18:18:43Z Message: all checks passed Reason: Passed Status: True Type: Available Events: ```

/kind bug

logicalhan commented 2 months ago

/kind support /triage accepted

kanhayaKy commented 2 weeks ago

Any update on this? I'm having similar issues,

The log from the metrics-server

E0704 07:13:21.054122       1 scraper.go:140] "Failed to scrape node" err="Get \"https://172.31.68.188:10250/metrics/resource\": remote error: tls: internal error" node="ip-172-31-68-188.ec2.internal"
E0704 07:13:36.062399       1 scraper.go:140] "Failed to scrape node" err="Get \"https://172.31.68.188:10250/metrics/resource\": remote error: tls: internal error" node="ip-172-31-68-188.ec2.internal"
E0704 07:13:36.120301       1 scraper.go:140] "Failed to scrape node" err="Get \"https://172.31.94.156:10250/metrics/resource\": remote error: tls: internal error" node="ip-172-31-94-156.ec2.internal"
E0704 07:13:36.128872       1 scraper.go:140] "Failed to scrape node" err="Get \"https://172.31.66.224:10250/metrics/resource\": remote error: tls: internal error" node="ip-172-31-66-224.ec2.internal"
E0704 07:20:51.104101       1 scraper.go:140] "Failed to scrape node" err="Get \"https://172.31.33.165:10250/metrics/resource\": remote error: tls: internal error" node="ip-172-31-33-165.ec2.internal"

On the node we are able to see that it's listening on port 10250, and was also able to establish connection to the prometheus operator pods

sh-4.2$ netstat -a | grep 10250
tcp6       0      0 [::]:10250              [::]:*                  LISTEN
tcp6       0      0 ip-172-31-34-243.:10250 ip-172-31-84-140.:59798 ESTABLISHED
tcp6       0      0 ip-172-31-34-243.:10250 ip-172-31-46-100.:39802 ESTABLISHED
tcp6       0      0 ip-172-31-34-243.:10250 ip-172-31-46-100.:44384 ESTABLISHED
tcp6       0      0 ip-172-31-34-243.:10250 ip-172-31-46-100.:39806 ESTABLISHED

This is a very strange behavior as we have not changed any config and started getting this issue out of no where