Closed sc-rz closed 6 years ago
Nevermind, this was an issue with my VPC DNS resolution
Same here,
I have manually set Image to metrics-server-amd64:v0.3.0
in metrics-server-deployment.yaml and deployed.
But,
`` kubectl logs metrics-server-754478c688-j5ckq -n kube-system
I0901 03:49:30.403514 1 serving.go:273] Generated self-signed cert (apiserver.local.config/certificates/apiserver.crt, apiserver.local.config/certificates/apiserver.key)
W0901 03:49:30.723508 1 authentication.go:166] cluster doesn't provide client-ca-file in configmap/extension-apiserver-authentication in kube-system, so client certificate authentication to extension api-server won't work.
W0901 03:49:30.732733 1 authentication.go:210] cluster doesn't provide client-ca-file in configmap/extension-apiserver-authentication in kube-system, so client certificate authentication to extension api-server won't work.
[restful] 2018/09/01 03:49:30 log.go:33: [restful/swagger] listing is available at https://:443/swaggerapi
[restful] 2018/09/01 03:49:30 log.go:33: [restful/swagger] https://:443/swaggerui/ is mapped to folder /swagger-ui/
I0901 03:49:30.778391 1 serve.go:96] Serving securely on [::]:443
And HPA is still showing
Warning FailedGetResourceMetric 4m (x191 over 1h) horizontal-pod-autoscaler unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)
I am also still unable to get HPA working. I ran kubectl describe apiservice v1beta1.metrics.k8s.io
and am having the same errors as in https://github.com/kubernetes-incubator/metrics-server/issues/45
Figured out my issue -- my worker node security group was misconfigured. I had to add an inbound rule to allow HTTPS (port 443) traffic from the control plane security group.
I just added incoming 443 from CONTROLE PLANE SECURITY GROUP and looks like it's working now. Thanks @sc-rz
The solution proposed by @MIBc works. Change the metrics-server-deployment.yaml file and add:
command:
没关系,这是我的VPC DNS解析的问题
Nevermind, this was an issue with my VPC DNS resolution
hi boss! my metrics-server pod hava the same as error information:
E1026 07:37:04.007899 1 reststorage.go:144] unable to fetch pod metrics for pod dev-java/csg-application-68584c6b66-c65k9: no metrics known for pod E1026 07:37:34.022311 1 reststorage.go:144] unable to fetch pod metrics for pod dev-java/csg-application-68584c6b66-c65k9: no metrics known for pod E1026 07:37:38.242410 1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:idc-k8snode-javaphp-001: unable to fetch metrics from Kubelet idc-k8snode-javaphp-001 (idc-k8snode-javaphp-001): Get https://idc-k8snode-javaphp-001:10250/stats/summary/: dial tcp: lookup idc-k8snode-javaphp-001 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:idc-k8smaster-javaphp-001: unable to fetch metrics from Kubelet idc-k8smaster-javaphp-001 (idc-k8smaster-javaphp-001): Get https://idc-k8smaster-javaphp-001:10250/stats/summary/: dial tcp: lookup idc-k8smaster-javaphp-001 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:idc-k8snode-javaphp-002: unable to fetch metrics from Kubelet idc-k8snode-javaphp-002 (idc-k8snode-javaphp-002): Get https://idc-k8snode-javaphp-002:10250/stats/summary/: dial tcp: lookup idc-k8snode-javaphp-002 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:idc-k8snode-javaphp-003: unable to fetch metrics from Kubelet idc-k8snode-javaphp-003 (idc-k8snode-javaphp-003): Get https://idc-k8snode-javaphp-003:10250/stats/summary/: dial tcp: lookup idc-k8snode-javaphp-003 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:idc-k8smaster-javaphp-002: unable to fetch metrics from Kubelet idc-k8smaster-javaphp-002 (idc-k8smaster-javaphp-002): Get https://idc-k8smaster-javaphp-002:10250/stats/summary/: dial tcp: lookup idc-k8smaster-javaphp-002 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:idc-k8snode-javaphp-004: unable to fetch metrics from Kubelet idc-k8snode-javaphp-004 (idc-k8snode-javaphp-004): Get https://idc-k8snode-javaphp-004:10250/stats/summary/: dial tcp: lookup idc-k8snode-javaphp-004 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:idc-k8smaster-javaphp-003: unable to fetch metrics from Kubelet idc-k8smaster-javaphp-003 (idc-k8smaster-javaphp-003): Get https://idc-k8smaster-javaphp-003:10250/stats/summary/: dial tcp: lookup idc-k8smaster-javaphp-003 on 10.96.0.10:53: no such host]
How did you solve it?!
Thanks @LucasSales, this ended up fixing the issue for me as well. It looks like port 443 has since been added to the needed SGs, but I was still getting the following error in my metrics-server:
E1026 14:41:58.325491 1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:ip-10-0-166-28.ec2.internal: unable to fetch metrics from Kubelet ip-10-0-166-28.ec2.internal (ip-10-0-166-28.ec2.internal): Get https://ip-10-0-166-28.ec2.internal:10250/stats/summary/: dial tcp: lookup ip-10-0-166-28.ec2.internal on 172.20.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:ip-10-0-135-135.ec2.internal: unable to fetch metrics from Kubelet ip-10-0-135-135.ec2.internal (ip-10-0-135-135.ec2.internal): Get https://ip-10-0-135-135.ec2.internal:10250/stats/summary/: dial tcp: lookup ip-10-0-135-135.ec2.internal on 172.20.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:ip-10-0-146-30.ec2.internal: unable to fetch metrics from Kubelet ip-10-0-146-30.ec2.internal (ip-10-0-146-30.ec2.internal): Get https://ip-10-0-146-30.ec2.internal:10250/stats/summary/: dial tcp: lookup ip-10-0-146-30.ec2.internal on 172.20.0.10:53: no such host]
Adding the command above works. Not sure if the root issue is related to CNI or something else. Would be curious to know if anyone else hits this.
FWIW, my cluster was manually set up (still in early POC phase) and was built per the current AWS Getting Started docs.
stuck with this issue over a week..tried all the above ..tried @LucasSales approach but that gives certificate error saying not created for that host ip, and my host would be changing in my cluster . port 443 is opened though ..not sure why everybody is talking about that
@kiahmed basically, you need to tell metrics-server to connect to your pods using a name or address that it can actually look up. So, by saying InternalIP, you're telling metrics-server to not use hostnames, but instead use the internal IP address of the node. However, if your serving certificates on the Kubelet aren't valid for that IP, you'll get a certificate error.
--kubelet-insecure-tls did the job which is okay for now for dev cluster, but even in prod api would be getting access under k8 main apiserver anyway and it has its own CA and validation, so does it really matter?
metrics-server doesn't talk to the nodes via the main API server -- it talks to them directly. Using --kubelet-insecure-tls
means that someone could MITM the metrics-server <-> kubelet connection, unless you're using some sort of service mesh or what-have-you that provides its own auth.
Nevermind, this was an issue with my VPC DNS resolution
I think I hit this issue as well, and it wasn't clear to me how VPC settings could break metrics server, besides NACLs. So just in case other people are broken because of their VPC configuration (not because of NACLs):
http://169.254.169.254/latest/meta-data/local-hostname
is set from the VPC DHCP settings. https://docs.aws.amazon.com/vpc/latest/userguide/VPC_DHCP_Options.htmlkubernetes.io/hostname
https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/aws/aws.go#L1244unable to fully scrape metrics from source kubelet_summary:ip-10-68-234-200.us-west-2.compute.internal: unable to fetch metrics from Kubelet ip-10-68-234-200.us-west-2.compute.internal (ip-10-68-234-200.ec2.internal): Get https://ip-10-68-234-200.ec2.internal:10250/stats/summary/: dial tcp: lookup ip-10-68-234-200.ec2.internal on 172.20.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:ip-10-68-234-239.us-west-2.compute.internal: unable to fetch metrics from Kubelet ip-10-68-234-239.us-west-2.compute.internal (ip-10-68-234-239.ec2.internal): Get https://ip-10-68-234-239.ec2.internal:10250/stats/summary/: dial tcp: lookup ip-10-68-234-239.ec2.internal on 172.20.0.10:53: no such host
E1214 06:23:17.408800 1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:ip-10-0-3-12.ec2.internal: unable to fetch metrics from Kubelet ip-10-0-3-12.ec2.internal (ip-10-0-3-12.ec2.internal): Get https://ip-10-0-3-12.ec2.internal:10250/stats/summary/: dial tcp: i/o timeout, unable to fully scrape metrics from source kubelet_summary:ip-10-0-1-54.ec2.internal: unable to fetch metrics from Kubelet ip-10-0-1-54.ec2.internal (ip-10-0-1-54.ec2.internal): Get https://ip-10-0-1-54.ec2.internal:10250/stats/summary/: dial tcp: i/o timeout]
SSL certificate problem: unable to get local issuer certificate curl: (60) SSL certificate problem: unable to get local issuer certificate
Thanks @LucasSales, this ended up fixing the issue for me as well. It looks like port 443 has since been added to the needed SGs, but I was still getting the following error in my metrics-server:
E1026 14:41:58.325491 1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:ip-10-0-166-28.ec2.internal: unable to fetch metrics from Kubelet ip-10-0-166-28.ec2.internal (ip-10-0-166-28.ec2.internal): Get https://ip-10-0-166-28.ec2.internal:10250/stats/summary/: dial tcp: lookup ip-10-0-166-28.ec2.internal on 172.20.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:ip-10-0-135-135.ec2.internal: unable to fetch metrics from Kubelet ip-10-0-135-135.ec2.internal (ip-10-0-135-135.ec2.internal): Get https://ip-10-0-135-135.ec2.internal:10250/stats/summary/: dial tcp: lookup ip-10-0-135-135.ec2.internal on 172.20.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:ip-10-0-146-30.ec2.internal: unable to fetch metrics from Kubelet ip-10-0-146-30.ec2.internal (ip-10-0-146-30.ec2.internal): Get https://ip-10-0-146-30.ec2.internal:10250/stats/summary/: dial tcp: lookup ip-10-0-146-30.ec2.internal on 172.20.0.10:53: no such host]
Adding the command above works. Not sure if the root issue is related to CNI or something else. Would be curious to know if anyone else hits this.
FWIW, my cluster was manually set up (still in early POC phase) and was built per the current AWS Getting Started docs.
I have same issue.
Hi guys, I'm running metrics-server
through a helm
chart on EKS and got all my HPA working but one, see:
NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
datateam hpa1 Deployment/hpa1 15%/75% 2 10 2 3h
default hpa2 Deployment/hpa2 1%/75% 2 10 2 21d
default hpa3 Deployment/hpa3 596%/75% 2 10 4 20d
nginx-ingress nginx-ingress-controller Deployment/nginx-ingress-controller <unknown>/50%, <unknown>/50% 3 11 3 50m
The one that is not working is another helm chart stable/nginx-ingress.
I have tried with --kubelet-insecure-tls
and --kubelet-preferred-address-types=InternalIP
without any luck.
top pods
is working fine
kubectl top pods -n nginx-ingress [19:17:34]
NAME CPU(cores) MEMORY(bytes)
nginx-ingress-controller-6c54d8d8fd-hbnmf 3m 77Mi
nginx-ingress-controller-6c54d8d8fd-m8jb8 3m 76Mi
nginx-ingress-controller-6c54d8d8fd-xvm5d 4m 76Mi
nginx-ingress-default-backend-544cfb69fc-7zvnw 1m 2Mi
Let me know if you need more info, thanks.
I got nginx-ingress-controller
hpa to work by defining resources
in my values.yaml
file 😅
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 100m
memory: 128Mi
I had the same issue. This solved my problem: https://stackoverflow.com/q/54106725/2291510
@kiahmed and @DirectXMan12
Referring to your comment https://github.com/kubernetes-incubator/metrics-server/issues/129#issuecomment-438448769 and https://github.com/kubernetes-incubator/metrics-server/issues/129#issuecomment-441808822
Adding --kubelet-insecure-tls
has worked for me. But is it fine to use this flag for the production cluster ? If not, then what needs to be done to make metrics-server to work ?
Is necessary add the resources example: resources: limits: cpu: 500m memory: 254Mi requests: cpu: 1000m memory: 1G
Had same problem. Solved it with this command:
helm upgrade --install metrics stable/metrics-server --namespace kube-system --set hostNetwork.enabled=true --set args={kubelet-insecure-tls}
Figured out my issue -- my worker node security group was misconfigured. I had to add an inbound rule to allow HTTPS (port 443) traffic from the control plane security group.
Thank you so much, that was it, networking/firewall issue
Hi,
I am testing the recently released HPA on Amazon's EKS but running into an issue where it's failing to ping the node.
(actual IP redacted)
I am using v0.3 after running
kubectl apply -f metrics-server/deploy/1.8+/
on commit 931ef8402ac7e9545156041e4479a02b055c0ab4Do i need to configure something?
Thanks