Closed edenreich closed 3 years ago
I have the same problem with a fresh install of k3s version v1.17.4+k3s1 running on k3OS v0.10.0 (also on Raspberry Pi 4, with server argument --flannel-backend=ipsec
).
I tried the modifications of /var/lib/rancher/k3s/server/manifests/metrics-server/metrics-server-deployment.yaml
described here (and restarted with kubectl -n kube-system rollout restart deployment metrics-server
) but the problem persists.
This is my yaml file now (maybe I added something in the wrong place?):
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: metrics-server
namespace: kube-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: metrics-server
namespace: kube-system
labels:
k8s-app: metrics-server
spec:
selector:
matchLabels:
k8s-app: metrics-server
template:
metadata:
name: metrics-server
labels:
k8s-app: metrics-server
spec:
serviceAccountName: metrics-server
hostNetwork:
enabled: true
volumes:
# mount in tmp so we can safely use from-scratch images and/or read-only containers
- name: tmp-dir
emptyDir: {}
containers:
- name: metrics-server
image: rancher/metrics-server:v0.3.6
command:
- /metrics-server
- --metrics-resolution=30s
- --requestheader-allowed-names=aggregator
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
volumeMounts:
- name: tmp-dir
mountPath: /tmp
This also results in the kubernetes-dashboard pod not starting.
I don't understand why k3s is contacting random internet servers by default. When apps phone home it tends to make people unpleasant to talk to. If you're lucky they just uninstall it and move on.
I just got here and "leave?" is already on my TODO list.
What do you mean? The metrics server is not "on the internet". It's a service that typically runs on a kubernetes cluster collecting RAM + CPU usage statistics and whatnot about the cluster's nodes (as far as I understand). It is not exclusive to k3s and neither is the ServiceUnavailable problem it seems.
@jdmarshall It sounds like you're under the impression that https://10.43.37.24:443/apis/metrics.k8s.io/v1beta1
is a server on the internet. All 10.x.x.x addresses, like 192.168.x.x and 172.16.x.x-172.31.x.x are reserved for private networks that you will not (or at least should not) find on the internet at large. See: https://tools.ietf.org/html/rfc1918
In this case, 10.43.x.x is used for Kubernetes services running within your cluster, while 10.42.x.x is used for Kubernetes pods. None of this is k3s specific; it's core to how Kubernetes works. If you're having an issue with your k3s cluster, please open a new issue - but perhaps try to avoid jumping to any conclusions about what k3s is or is not doing.
I'm not seeing IP addresses. I'm seeing repeated errors trying to connect to FQDNs, like v1beta1.metrics.k8s.io
Why use a subdomain of a registered internet domain for RFC1918 traffic? That doesn't telegraph 'local address lookup', let alone local/vlan traffic.
@jdmarshall are you talking about errors like:
Jun 28 22:51:10 k8s-master k3s[622]: E0628 22:51:10.759934 622 available_controller.go:420] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.43.37.24:443/apis/metrics.k8s.io/
That's the APIService name. They're all namespaced as part of the Kubernetes standard; it's not a hostname any more than a Java class name with java.sun.com in it is a hostname. See: https://github.com/kubernetes-sigs/metrics-server/blob/master/manifests/base/apiservice.yaml#L5
The error indicates that it's failing to access a resource with that API group from a cluster API server endpoint.
Has anyone made any progress on this?
It's the last issue I'm running into with my new little AWS based k3s cluster.
My issue is exactly as described.
I've tried various forms of:
I opened up all my SG for All Access across 10.x.x.x.
hostNetwork:
enabled: true
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
Checked my DNS settings.
Tried raising resource limits for metric-server.
Still the same issue. :(
Edit:
I have re-rolled my cluster and updated to using curl -sfL https://get.k3s.io | K3S_TOKEN="redacted" INSTALL_K3S_EXEC="--tls-san redacted.elb.us-west-2.amazonaws.com --disable traefik" sh -
I was previously using the setup script from here (with the k3s version updated!):
https://github.com/sgdan/k3s-test/blob/master/templates/server.j2#L20
I think that removing --disable-agent
or adding --tls-san
may have solved it for me!
I've disabled the metrics-server
a long time ago because it just doesn't work, I've gave it another try tonight and finally make it work!
My configuration:
v1.18.8+k3s1
v1.18.8+k3s1
First of all, after an accurate reading of kubernetes documentation I've ended adding enable-aggregator-routing=true
flag on api-server. Here's my master's configuration (beware I've also enabled pod security policy, you might not want it :p )
k3s server --disable-agent --disable traefik --disable metrics-server --kube-apiserver-arg enable-admission-plugins=PodSecurityPolicy,NodeRestriction --kube-apiserver-arg enable-aggregator-routing=true
I've disabled the provided metrics-server
in order to use the "official" one.
So, starting from the official deployment, I've added the followings args
:
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
- --v=2
v=2
seems important, because when I add it, I got interesting logs from the pod.
I0825 19:08:29.470550 1 serving.go:312] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
I0825 19:08:32.327548 1 manager.go:95] Scraping metrics from 0 sources
I0825 19:08:32.327915 1 manager.go:148] ScrapeMetrics: time: 2.982µs, nodes: 0, pods: 0
I0825 19:08:32.371075 1 secure_serving.go:116] Serving securely on 0.0.0.0:4443
And finally, I've added hostNetwork: true
on the deployment, and after 2 minutes, I've got kubectl top pods
working!
I've disabled the
metrics-server
a long time ago because it just doesn't work, I've gave it another try tonight and finally make it work!My configuration:
- 1 Pi3 as a master on
v1.18.8+k3s1
- 2 Pi4 as nodes on
v1.18.8+k3s1
First of all, after an accurate reading of kubernetes documentation I've ended adding
enable-aggregator-routing=true
flag on api-server. Here's my master's configuration (beware I've also enabled pod security policy, you might not want it :p )k3s server --disable-agent --disable traefik --disable metrics-server --kube-apiserver-arg enable-admission-plugins=PodSecurityPolicy,NodeRestriction --kube-apiserver-arg enable-aggregator-routing=true
I've disabled the provided
metrics-server
in order to use the "official" one.So, starting from the official deployment, I've added the followings
args
:- --kubelet-insecure-tls - --kubelet-preferred-address-types=InternalIP - --v=2
v=2
seems important, because when I add it, I got interesting logs from the pod.I0825 19:08:29.470550 1 serving.go:312] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key) I0825 19:08:32.327548 1 manager.go:95] Scraping metrics from 0 sources I0825 19:08:32.327915 1 manager.go:148] ScrapeMetrics: time: 2.982µs, nodes: 0, pods: 0 I0825 19:08:32.371075 1 secure_serving.go:116] Serving securely on 0.0.0.0:4443
And finally, I've added
hostNetwork: true
on the deployment, and after 2 minutes, I've gotkubectl top pods
working!
Cool, thanks for sharing a walk around. Going to give it a try this weekend.
@ViBiOh awesome, your solution works, I can finally get some pods and nodes output :)) thanks!!
ping on the issue - question remains why it does not work out of the box when installing k3s latest version ? any ideas ? I think this deserve a further investigation.. Perhaps we can provide this flags to the default metrics-server ?
Alas I still can't get it to work (I've just been trying some more). The only thing I haven't yet done from @ViBiOh's instructions is swapped out the default k3s deployment of metrics-server for the official one. Can anyone explain why this makes a difference?
I agree with @edenreich - this should work out of the box in k3s. From a lot of reading around and trying things, I can't see that it ever would do - the k3s processes are making the request from the node's network (192.168.x.x in my case) but can't access the cluster's network (10.42.x.x or 10.43.x.x). Has ANYONE actually had success with the default k3s configuration?
Same issue here on a on premise install
infra-kubernetes on master !
➜ k top nodes
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
infra-kubernetes on master ! ➜ k logs metrics-server-65bfbc8684-l5k4c I1025 18:31:16.939412 1 serving.go:312] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key) W1025 18:31:17.164870 1 authentication.go:296] Cluster doesn't provide requestheader-client-ca-file in configmap/extension-apiserver-authentication in kube-system, so request-header client certificate authentication won't work. I1025 18:31:17.176723 1 manager.go:95] Scraping metrics from 0 sources I1025 18:31:17.176752 1 manager.go:148] ScrapeMetrics: time: 792ns, nodes: 0, pods: 0 I1025 18:31:17.187772 1 secure_serving.go:116] Serving securely on [::]:4443 I1025 18:32:17.176937 1 manager.go:95] Scraping metrics from 3 sources I1025 18:32:17.178147 1 manager.go:120] Querying source: kubelet_summary:tspeda-k8s-worker2 I1025 18:32:17.188162 1 manager.go:120] Querying source: kubelet_summary:tspeda-k8s-worker1 I1025 18:32:17.200466 1 manager.go:120] Querying source: kubelet_summary:tspeda-k8s-worker3 I1025 18:32:17.212727 1 manager.go:148] ScrapeMetrics: time: 35.731239ms, nodes: 3, pods: 4
infra-kubernetes on master ! ➜ k top nodes Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
infra-kubernetes on master ! ➜ cd ansible
infra-kubernetes/ansible on master ! ➜ ansible tspeda-k8s-controller1 -m shell -a "systemctl status kube-apiserver" tspeda-k8s-controller1 | CHANGED | rc=0 >> ● kube-apiserver.service - Kubernetes API Server Loaded: loaded (/etc/systemd/system/kube-apiserver.service; enabled; vendor preset: enabled) Active: active (running) since Sun 2020-10-25 17:59:28 UTC; 34min ago Docs: https://github.com/kubernetes/kubernetes Main PID: 28380 (kube-apiserver) Tasks: 14 (limit: 6970) Memory: 415.5M CGroup: /system.slice/kube-apiserver.service └─28380 /usr/bin/kube-apiserver --advertise-address=162.38.60.201 --allow-privileged=true --apiserver-count=3 --audit-log-maxage=30 --audit-log-maxbackup=3 --audit-log-maxsize=100 --audit-log-path=/var/log/audit.log --authorization-mode=Node,RBAC --bind-address=0.0.0.0 --client-ca-file=/var/lib/kubernetes/ca.pem --enable-admission-plugins=NamespaceLifecycle,NodeRestriction,LimitRanger,ServiceAccount,DefaultStorageClass,ResourceQuota --etcd-cafile=/var/lib/kubernetes/ca.pem --etcd-certfile=/var/lib/kubernetes/kubernetes.pem --etcd-keyfile=/var/lib/kubernetes/kubernetes-key.pem --etcd-servers=https://162.38.60.201:2379,https://162.38.60.202:2379,https://162.38.60.203:2379 --event-ttl=1h --encryption-provider-config=/var/lib/kubernetes/encryption-config.yaml --kubelet-certificate-authority=/var/lib/kubernetes/ca.pem --kubelet-client-certificate=/var/lib/kubernetes/kubernetes.pem --kubelet-client-key=/var/lib/kubernetes/kubernetes-key.pem --kubelet-https=true --runtime-config=api/all=true --service-account-key-file=/var/lib/kubernetes/service-account.pem --service-cluster-ip-range=10.32.0.0/24 --service-node-port-range=30000-32767 --tls-cert-file=/var/lib/kubernetes/kubernetes.pem --tls-private-key-file=/var/lib/kubernetes/kubernetes-key.pem --enable-aggregator-routing=true --v=2
Oct 25 18:32:18 tspeda-k8s-controller1 kube-apiserver[28380]: W1025 18:32:18.269972 28380 handler_proxy.go:102] no RequestInfo found in the context Oct 25 18:32:18 tspeda-k8s-controller1 kube-apiserver[28380]: E1025 18:32:18.270064 28380 controller.go:114] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable Oct 25 18:32:18 tspeda-k8s-controller1 kube-apiserver[28380]: , Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]] Oct 25 18:32:18 tspeda-k8s-controller1 kube-apiserver[28380]: I1025 18:32:18.270077 28380 controller.go:127] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue. Oct 25 18:32:32 tspeda-k8s-controller1 kube-apiserver[28380]: E1025 18:32:32.378551 28380 available_controller.go:420] v1beta1.metrics.k8s.io failed with: failing or missing response from https://162.38.60.204:4443/apis/metrics.k8s.io/v1beta1: bad status from https://162.38.60.204:4443/apis/metrics.k8s.io/v1beta1: 403 Oct 25 18:32:37 tspeda-k8s-controller1 kube-apiserver[28380]: E1025 18:32:37.767332 28380 available_controller.go:420] v1beta1.metrics.k8s.io failed with: failing or missing response from https://162.38.60.204:4443/apis/metrics.k8s.io/v1beta1: bad status from https://162.38.60.204:4443/apis/metrics.k8s.io/v1beta1: 403 Oct 25 18:33:02 tspeda-k8s-controller1 kube-apiserver[28380]: E1025 18:33:02.379181 28380 available_controller.go:420] v1beta1.metrics.k8s.io failed with: failing or missing response from https://162.38.60.204:4443/apis/metrics.k8s.io/v1beta1: bad status from https://162.38.60.204:4443/apis/metrics.k8s.io/v1beta1: 403 Oct 25 18:33:07 tspeda-k8s-controller1 kube-apiserver[28380]: E1025 18:33:07.769324 28380 available_controller.go:420] v1beta1.metrics.k8s.io failed with: failing or missing response from https://162.38.60.204:4443/apis/metrics.k8s.io/v1beta1: bad status from https://162.38.60.204:4443/apis/metrics.k8s.io/v1beta1: 403 Oct 25 18:33:32 tspeda-k8s-controller1 kube-apiserver[28380]: E1025 18:33:32.379835 28380 available_controller.go:420] v1beta1.metrics.k8s.io failed with: failing or missing response from https://162.38.60.204:4443/apis/metrics.k8s.io/v1beta1: bad status from https://162.38.60.204:4443/apis/metrics.k8s.io/v1beta1: 403 Oct 25 18:33:37 tspeda-k8s-controller1 kube-apiserver[28380]: E1025 18:33:37.772565 28380 available_controller.go:420] v1beta1.metrics.k8s.io failed with: failing or missing response from https://162.38.60.204:4443/apis/metrics.k8s.io/v1beta1: bad status from https://162.38.60.204:4443/apis/metrics.k8s.io/v1beta1: 403
I am getting this issue as well with k3s on Hetzner Cloud. Metric Server works great on node1 but timeouts constantly on the other two nodes.
The workaround of @ViBiOh works thought, so there is something weird with the initial setup.
Just wanted to add that I managed to fix this finally. It was a host network issue, where the floating IP that was set for some reason conflicted with the host IP of the node. Using Ubuntu 20.04 and Netplan I had to set the host IP BEFORE the floating IP to not cause some kind of internal routing issue within Kubernetes/k3s. Never managed to figure it out because, but this simple solution fixed all my problems.
Please avoid using --disable-agent
, it will probably cause more problems than it will fix.
The order that network interfaces come up may be important, especially since k8s uses iptables.
If you have multiple network interfaces please ensure that --flannel-iface
points to the interface where nodes have shared networking. For something like ipsec there may be a lower level networking issue that needs to be resolved.
This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.
Just wanted to add that I managed to fix this finally. It was a host network issue, where the floating IP that was set for some reason conflicted with the host IP of the node. Using Ubuntu 20.04 and Netplan I had to set the host IP BEFORE the floating IP to not cause some kind of internal routing issue within Kubernetes/k3s. Never managed to figure it out because, but this simple solution fixed all my problems.
Thank you, finally got it fixed after a couple of days of headache.
Hardware Raspberry Pi 4 8GB RAM (Buster Lite OS)
Version: v1.18.4+k3s1
K3S arguments Server: --docker --no-deploy=traefik Agent: --docker
Describe the bug
Fresh installation of k3s, and run
kubectl top nodes
getting error from API 503 service not available.To Reproduce
Install k3s using k3s-ansible with the specified version
Expected behavior
I'd expected to see metrics of nodes
Actual behavior
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
Additional context / logs
Systemd Server: