Closed rgl closed 7 months ago
Could you try disabling metrics and checking if it improves anything? Pass --metrics-provider=none
arg to the API. That's the only thing I can think of that could be a bottleneck here.
@floreks ah that did the trick! now it pretty fast!
I will have to investigate that at some point. There were no real changes to metrics gathering. Maybe there is an issue with metrics server responsiveness.
hummm I do not have metrics-server installed in my kind cluster. without metrics-server is this expected to be slow?
if so, maybe the FAQ should make it more explicit?
the chart values.yaml comments seem to be more explicit. maybe put that in the faq?
:wave: I'm experiencing a similar issue after upgrading from a much earlier version. I added:
api:
containers:
args:
- --metrics-provider=none
and things are substantially better on most pages.
However, some pages still struggle to load quickly (especially the Workloads page) and I have less than 150 pods.
I'm running k3s with the builtin metrics-server
.
Some request timings:
edit: Eventually, things got super slow again after I clicked around a bunch. Then, I restarted the api
pod and things got snappy again...
If you start clicking too much and spamming API server with requests throttling will kick in and significantly slow down your responses. Restarting API server can 'reset' throttling and it will work faster. Normal use should be ok.
If you start clicking too much and spamming API server with requests throttling will kick in and significantly slow down your responses. Restarting API server can 'reset' throttling and it will work faster. Normal use should be ok.
Hmm, I'm still a bit surprised that I can cause throttling by human-scale-clicking around. To be clear, I wasn't trying to stress the system, just view different panels in the UI :)
Here's how the requests from /#/workloads?namespace=_all
look after ~6 hours of not accessing the dashboard at all:
There aren't any timeouts but this is still really slow, right?
That is definitely unexpected. What device are you using for your k3s installation?
That is definitely unexpected. What device are you using for your k3s installation?
4 cores of an AMD EPYC 7371.
Some quick benchmarks:
sushain@vesuvianite ~ ❯❯❯ hyperfine 'kubectl get pods -A' 18:20:19
Benchmark 1: kubectl get pods -A
Time (mean ± σ): 220.6 ms ± 4.1 ms [User: 207.0 ms, System: 71.1 ms]
Range (min … max): 214.3 ms … 227.9 ms 13 runs
sushain@vesuvianite ~ ❯❯❯ hyperfine 'kubectl describe pods -A' 18:20:29
Benchmark 1: kubectl describe pods -A
Time (mean ± σ): 1.231 s ± 0.031 s [User: 0.533 s, System: 0.123 s]
Range (min … max): 1.188 s … 1.294 s 10 runs
sushain@vesuvianite ~ ❯❯❯ hyperfine 'kubectl get deployments -A' 18:20:43
Benchmark 1: kubectl get deployments -A
Time (mean ± σ): 177.5 ms ± 7.1 ms [User: 175.0 ms, System: 58.1 ms]
Range (min … max): 169.8 ms … 195.3 ms 16 runs
sushain@vesuvianite ~ ❯❯❯ hyperfine 'kubectl describe deployments -A'
Benchmark 1: kubectl describe deployments -A
Time (mean ± σ): 1.021 s ± 0.032 s [User: 0.426 s, System: 0.121 s]
Range (min … max): 0.980 s … 1.097 s 10 runs
So I guess my timings in the UI aren't that much slower if it's calling the equivalent of kubectl describe
...
We also can't directly compare kubectl to the UI as we have to make more calls than kubectl to get some extra information and apply additional logic such as server side pagination, sorting, filtering. It will always be slower.
We also can't directly compare kubectl to the UI as we have to make more calls than kubectl to get some extra information and apply additional logic such as server side pagination, sorting, filtering. It will always be slower.
Yep, that makes sense. FWIW I jumped from docker.io/kubernetesui/dashboard-api:v1.0.0
to docker.io/kubernetesui/dashboard-api:1.4.1
so there might be a bunch of changes... maybe I'll try bisecting through the Helm chart versions at some point.
@sushain97 I have been further debugging the performance issue and pinned it down exactly. Add --sidecar-host
arg to dashboard API deployment. Example: --sidecar-host=kubernetes-dashboard-metrics-scraper.dashboard
where kubernetes-dashboard-metrics-scraper
is metrics-scraper
service name and dashboard
is your namespace where Dashboard is deployed.
I honestly have no idea what is causing in-cluster service proxy to be super slow compared to accessing metrics scraper with HTTP client through service proxy directly. I don't see anything that changed there recently.
Hm, it doesn't feel too different to me:
Here's what I have:
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
name: kubernetes-dashboard
namespace: kube-system
spec:
repo: https://kubernetes.github.io/dashboard/
chart: kubernetes-dashboard
targetNamespace: kubernetes-dashboard
version: 7.2.0
valuesContent: |-
app:
scheduling:
nodeSelector:
kubernetes.io/hostname: kube.local.skc.name
# https://github.com/kubernetes/dashboard/issues/8835
api:
containers:
args:
- --metrics-provider=none
- --sidecar-host=kubernetes-dashboard-metrics-scraper.kubernetes-dashboard
kong:
proxy:
http:
enabled: true
I encountered a similar thing once I upgraded to the newer versions of kubernetes-dashboard (lots of requests timing out). API server logs showed this, client-side throttling in effect:
2024/04/10 04:24:53 Getting list of namespaces
2024/04/10 04:24:54 Getting list of all jobs in the cluster
2024/04/10 04:24:55 Getting list of all pods in the cluster
I0410 04:24:56.623578 1 request.go:697] Waited for 1.199406392s due to client-side throttling, not priority and fairness, request: GET:https://10.152.183.1:443/api/v1/namespaces/kubernetes-dashboard/services/kubernetes-dashboard-metrics-scraper/proxy/api/v1/dashboard/namespaces/kube-system/pod-list/kube-multus-ds-5cc2s,kubed-57f78db5b6-2hvch,external-dns-54fdb56c7-llcp8,kube-api-proxy-5769f97cdf-mkhgz,kube-proxy-m5l4b,kube-scheduler-swerver,kube-controller-manager-swerver,kube-apiserver-swerver,etcd-swerver,coredns-76f75df574-wt796,coredns-76f75df574-q99kd,openebs-lvm-controller-0,openebs-lvm-node-w5vhp,calico-node-sngft,smarter-device-manager-gs99r,metrics-server-85bc948865-b7xrv,calico-kube-controllers-9d77f677d-m84kv/metrics/cpu/usage_rate
2024/04/10 04:25:01 Getting pod metrics
2024/04/10 04:25:03 Getting list of namespaces
2024/04/10 04:25:04 Getting list of all pods in the cluster
I0410 04:25:06.823448 1 request.go:697] Waited for 2.783877882s due to client-side throttling, not priority and fairness, request: GET:https://10.152.183.1:443/api/v1/namespaces/kubernetes-dashboard/services/kubernetes-dashboard-metrics-scraper/proxy/api/v1/dashboard/namespaces/kubernetes-dashboard/pod-list/kubernetes-dashboard-api-774cf68885-sqdbw,kubernetes-dashboard-web-5b8d87bf85-n2smh,kubernetes-dashboard-auth-6cf78cdd47-5qb2h,kubernetes-dashboard-kong-6cf54d7fcf-74ltv,kubernetes-dashboard-metrics-scraper-9758854f6-gpzlb,kubernetes-dashboard-proxy-5c7cd7d76c-dxdw9/metrics/memory/usage
2024/04/10 04:25:13 Getting list of namespaces
2024/04/10 04:25:14 Getting pod metrics
2024/04/10 04:25:14 Getting list of all pods in the cluster
I0410 04:25:16.824112 1 request.go:697] Waited for 1.992075126s due to client-side throttling, not priority and fairness, request: GET:https://10.152.183.1:443/api/v1/namespaces/kubernetes-dashboard/services/kubernetes-dashboard-metrics-scraper/proxy/api/v1/dashboard/namespaces/kubernetes-dashboard/pod-list/kubernetes-dashboard-api-774cf68885-sqdbw,kubernetes-dashboard-web-5b8d87bf85-n2smh,kubernetes-dashboard-auth-6cf78cdd47-5qb2h,kubernetes-dashboard-kong-6cf54d7fcf-74ltv,kubernetes-dashboard-metrics-scraper-9758854f6-gpzlb,kubernetes-dashboard-proxy-5c7cd7d76c-dxdw9/metrics/cpu/usage_rate
2024/04/10 04:25:23 Getting list of namespaces
2024/04/10 04:25:24 Getting list of all pods in the cluster
I0410 04:25:27.023139 1 request.go:697] Waited for 5.387414496s due to client-side throttling, not priority and fairness, request: GET:https://10.152.183.1:443/api/v1/namespaces/kubernetes-dashboard/services/kubernetes-dashboard-metrics-scraper/proxy/api/v1/dashboard/namespaces/guacamole/pod-list/postgres-77659c7c89-2n55q,guacamole-64b6c4c56-vxz9d,oauth2-proxy-64968f4c7f-df6cn/metrics/memory/usage
2024/04/10 04:25:28 Getting pod metrics
2024/04/10 04:25:33 Getting list of namespaces
2024/04/10 04:25:34 Getting list of all pods in the cluster
I0410 04:25:37.023852 1 request.go:697] Waited for 4.792209339s due to client-side throttling, not priority and fairness, request: GET:https://10.152.183.1:443/api/v1/namespaces/kubernetes-dashboard/services/kubernetes-dashboard-metrics-scraper/proxy/api/v1/dashboard/namespaces/kubevirt/pod-list/virt-handler-cgnwr,virt-api-54666f869-q8sg9,virt-controller-c67776ccb-949z4,virt-operator-67d55bb884-rwmjs,virtvnc-65986fb5d7-6ghmr,virt-operator-67d55bb884-nl2k8,kvpanel-d46bb99dd-7ss6d,virt-controller-c67776ccb-82pjg/metrics/memory/usage
Setting metrics-provider=none
does seem to help:
2024/04/10 04:28:26 Getting list of namespaces
2024/04/10 04:28:36 Getting list of namespaces
2024/04/10 04:28:46 Getting list of namespaces
2024/04/10 04:28:52 Getting list of all pods in the cluster
2024/04/10 04:28:53 Getting pod metrics
2024/04/10 04:28:56 Getting list of namespaces
2024/04/10 04:29:02 Getting list of all pods in the cluster
2024/04/10 04:29:03 Getting pod metrics
2024/04/10 04:29:06 Getting list of namespaces
2024/04/10 04:29:06 Getting list of all pods in the cluster
2024/04/10 04:29:06 Getting pod metrics
2024/04/10 04:29:09 Getting list of all deployments in the cluster
2024/04/10 04:29:12 Getting list of all pods in the cluster
2024/04/10 04:29:12 Getting pod metrics
2024/04/10 04:29:16 Getting list of namespaces
2024/04/10 04:29:22 Getting list of all pods in the cluster
2024/04/10 04:29:22 Getting pod metrics
2024/04/10 04:29:26 Getting list of namespaces
...but that wasn't the first thing that I tried because I wanted to keep metrics.
What I found was that in https://github.com/kubernetes/dashboard/blob/567a38f476b33542534a94f622e1f7aa18a635e0/modules/common/client/init.go#L48 if the in-cluster config is being used (the common case?) then it's being immediately returned and the default request limits at https://github.com/kubernetes/dashboard/blob/567a38f476b33542534a94f622e1f7aa18a635e0/modules/common/client/init.go#L64 aren't being applied. I think that buildBaseConfig
needs to fetch its config from whatever source it can, but then also apply its default settings on top of that, specifically the queries per second limit.
Below is the compare of what I ended up using for my own use case, but I feel like I could clean it up as far as pointer usage, happy for any advice.
https://github.com/kubernetes/dashboard/compare/master...bnabholz:kubernetes-dashboard:fixes/qps
Hm, it doesn't feel too different to me:
Here's what I have:
apiVersion: helm.cattle.io/v1 kind: HelmChart metadata: name: kubernetes-dashboard namespace: kube-system spec: repo: https://kubernetes.github.io/dashboard/ chart: kubernetes-dashboard targetNamespace: kubernetes-dashboard version: 7.2.0 valuesContent: |- app: scheduling: nodeSelector: kubernetes.io/hostname: kube.local.skc.name # https://github.com/kubernetes/dashboard/issues/8835 api: containers: args: - --metrics-provider=none - --sidecar-host=kubernetes-dashboard-metrics-scraper.kubernetes-dashboard kong: proxy: http: enabled: true
You could actually reenable metrics with that sidecar host change. If that doesn't help then it might be your machine. When I was testing locally on my kind cluster response times went down from 1-3 seconds to 100ms on average for every view with all namespaces selected.
I encountered a similar thing once I upgraded to the newer versions of kubernetes-dashboard (lots of requests timing out). API server logs showed this, client-side throttling in effect:
2024/04/10 04:24:53 Getting list of namespaces 2024/04/10 04:24:54 Getting list of all jobs in the cluster 2024/04/10 04:24:55 Getting list of all pods in the cluster I0410 04:24:56.623578 1 request.go:697] Waited for 1.199406392s due to client-side throttling, not priority and fairness, request: GET:https://10.152.183.1:443/api/v1/namespaces/kubernetes-dashboard/services/kubernetes-dashboard-metrics-scraper/proxy/api/v1/dashboard/namespaces/kube-system/pod-list/kube-multus-ds-5cc2s,kubed-57f78db5b6-2hvch,external-dns-54fdb56c7-llcp8,kube-api-proxy-5769f97cdf-mkhgz,kube-proxy-m5l4b,kube-scheduler-swerver,kube-controller-manager-swerver,kube-apiserver-swerver,etcd-swerver,coredns-76f75df574-wt796,coredns-76f75df574-q99kd,openebs-lvm-controller-0,openebs-lvm-node-w5vhp,calico-node-sngft,smarter-device-manager-gs99r,metrics-server-85bc948865-b7xrv,calico-kube-controllers-9d77f677d-m84kv/metrics/cpu/usage_rate 2024/04/10 04:25:01 Getting pod metrics 2024/04/10 04:25:03 Getting list of namespaces 2024/04/10 04:25:04 Getting list of all pods in the cluster I0410 04:25:06.823448 1 request.go:697] Waited for 2.783877882s due to client-side throttling, not priority and fairness, request: GET:https://10.152.183.1:443/api/v1/namespaces/kubernetes-dashboard/services/kubernetes-dashboard-metrics-scraper/proxy/api/v1/dashboard/namespaces/kubernetes-dashboard/pod-list/kubernetes-dashboard-api-774cf68885-sqdbw,kubernetes-dashboard-web-5b8d87bf85-n2smh,kubernetes-dashboard-auth-6cf78cdd47-5qb2h,kubernetes-dashboard-kong-6cf54d7fcf-74ltv,kubernetes-dashboard-metrics-scraper-9758854f6-gpzlb,kubernetes-dashboard-proxy-5c7cd7d76c-dxdw9/metrics/memory/usage 2024/04/10 04:25:13 Getting list of namespaces 2024/04/10 04:25:14 Getting pod metrics 2024/04/10 04:25:14 Getting list of all pods in the cluster I0410 04:25:16.824112 1 request.go:697] Waited for 1.992075126s due to client-side throttling, not priority and fairness, request: GET:https://10.152.183.1:443/api/v1/namespaces/kubernetes-dashboard/services/kubernetes-dashboard-metrics-scraper/proxy/api/v1/dashboard/namespaces/kubernetes-dashboard/pod-list/kubernetes-dashboard-api-774cf68885-sqdbw,kubernetes-dashboard-web-5b8d87bf85-n2smh,kubernetes-dashboard-auth-6cf78cdd47-5qb2h,kubernetes-dashboard-kong-6cf54d7fcf-74ltv,kubernetes-dashboard-metrics-scraper-9758854f6-gpzlb,kubernetes-dashboard-proxy-5c7cd7d76c-dxdw9/metrics/cpu/usage_rate 2024/04/10 04:25:23 Getting list of namespaces 2024/04/10 04:25:24 Getting list of all pods in the cluster I0410 04:25:27.023139 1 request.go:697] Waited for 5.387414496s due to client-side throttling, not priority and fairness, request: GET:https://10.152.183.1:443/api/v1/namespaces/kubernetes-dashboard/services/kubernetes-dashboard-metrics-scraper/proxy/api/v1/dashboard/namespaces/guacamole/pod-list/postgres-77659c7c89-2n55q,guacamole-64b6c4c56-vxz9d,oauth2-proxy-64968f4c7f-df6cn/metrics/memory/usage 2024/04/10 04:25:28 Getting pod metrics 2024/04/10 04:25:33 Getting list of namespaces 2024/04/10 04:25:34 Getting list of all pods in the cluster I0410 04:25:37.023852 1 request.go:697] Waited for 4.792209339s due to client-side throttling, not priority and fairness, request: GET:https://10.152.183.1:443/api/v1/namespaces/kubernetes-dashboard/services/kubernetes-dashboard-metrics-scraper/proxy/api/v1/dashboard/namespaces/kubevirt/pod-list/virt-handler-cgnwr,virt-api-54666f869-q8sg9,virt-controller-c67776ccb-949z4,virt-operator-67d55bb884-rwmjs,virtvnc-65986fb5d7-6ghmr,virt-operator-67d55bb884-nl2k8,kvpanel-d46bb99dd-7ss6d,virt-controller-c67776ccb-82pjg/metrics/memory/usage
Setting
metrics-provider=none
does seem to help:2024/04/10 04:28:26 Getting list of namespaces 2024/04/10 04:28:36 Getting list of namespaces 2024/04/10 04:28:46 Getting list of namespaces 2024/04/10 04:28:52 Getting list of all pods in the cluster 2024/04/10 04:28:53 Getting pod metrics 2024/04/10 04:28:56 Getting list of namespaces 2024/04/10 04:29:02 Getting list of all pods in the cluster 2024/04/10 04:29:03 Getting pod metrics 2024/04/10 04:29:06 Getting list of namespaces 2024/04/10 04:29:06 Getting list of all pods in the cluster 2024/04/10 04:29:06 Getting pod metrics 2024/04/10 04:29:09 Getting list of all deployments in the cluster 2024/04/10 04:29:12 Getting list of all pods in the cluster 2024/04/10 04:29:12 Getting pod metrics 2024/04/10 04:29:16 Getting list of namespaces 2024/04/10 04:29:22 Getting list of all pods in the cluster 2024/04/10 04:29:22 Getting pod metrics 2024/04/10 04:29:26 Getting list of namespaces
...but that wasn't the first thing that I tried because I wanted to keep metrics.
What I found was that in https://github.com/kubernetes/dashboard/blob/567a38f476b33542534a94f622e1f7aa18a635e0/modules/common/client/init.go#L48 if the in-cluster config is being used (the common case?) then it's being immediately returned and the default request limits at https://github.com/kubernetes/dashboard/blob/567a38f476b33542534a94f622e1f7aa18a635e0/modules/common/client/init.go#L64 aren't being applied. I think that
buildBaseConfig
needs to fetch its config from whatever source it can, but then also apply its default settings on top of that, specifically the queries per second limit.Below is the compare of what I ended up using for my own use case, but I feel like I could clean it up as far as pointer usage, happy for any advice.
https://github.com/kubernetes/dashboard/compare/master...bnabholz:kubernetes-dashboard:fixes/qps
Ye, I have pinned it down to in-cluster client too, but I actually ended up using fake rate limiter as i.e. internal rest client derived from client was also overriding some configuration for me. I will create a PR with a bunch of changes including this fix a bit later today.
Thanks for your help anyway!
What happened?
updating any resource takes too long (> 1s) which is substantial higher than the apparently equivalent kubectl command.
What did you expect to happen?
Expected to see deployments displayed in roughly the same amount of time as
kubectl get deployments -A
.How can we reproduce it (as minimally and precisely as possible)?
Observe the time taken with kubectl, 0.068s:
Getting, and displaying the entire yaml, 0.140s:
Observe the time taken with the browser, 1.2s:
Anything else we need to know?
This was tested in a kind cluster, with traefik ingress controller, sending data to kong using http (without tls), and lifting all resource limits (also note that modifying the api replicas does not seem to make much difference):
The entire ansible playbook is at:
https://github.com/rgl/my-ubuntu-ansible-playbooks/tree/upgrade-kubernetes-dashboard
Have a look a the last commit in that branch to see just the kubernetes-dashboard changes.
What browsers are you seeing the problem on?
No response
Kubernetes Dashboard version
7.1.2
Kubernetes version
1.29.2
Dev environment
No response