Closed brandond closed 5 years ago
/assign
I wrote a simple python script to brute-force the minimum length before the requests time out. It seems to be consistent and correlated to URI length >= 610 characters, at least within my dev environment:
Python script: https://gist.github.com/brandond/28153c1b5f823b6191e4cea68c680423 Results:
pods in kube-system: aws-node-6jgtg,aws-node-nkknm,aws-node-q9sch,aws-node-xpxxj,aws-node-znsrt,aws-node-zvddg,coredns-955588fc4-46krq,coredns-955588fc4-lf7pb,external-dns-79dd4f7cf5-v9hw2,grafana-5494847df5-vd2d9,kiam-agent-26kqr,kiam-agent-6xdp8,kiam-agent-bwfsk,kiam-agent-lj9hc,kiam-server-77djh,kiam-server-ps7qj,kube-proxy-827bf,kube-proxy-gm8tl,kube-proxy-hd7dl,kube-proxy-hst76,kube-proxy-qcp26,kube-proxy-z7bjh,kube-state-metrics-5975c6f6dc-qx7w4,kubernetes-dashboard-787f6fb4d8-qmsbp,kubernetes-metrics-scraper-86667748bb-zszjg,metrics-server-578dc65b48-b92fg,node-exporter-4fdsw,node-exporter-dmhk2,node-exporter-kt292,node-exporter-rws67,node-exporter-xlqfp,node-exporter-z8b8m,prometheus-0
Request timed out with len(pod-list)=447 len(uri)=610
http://localhost:8001/api/v1/namespaces/kube-system/services/dashboard-metrics-scraper/proxy/api/v1/dashboard/namespaces/kube-system/pod-list/aws-node-6jgtg,aws-node-nkknm,aws-node-q9sch,aws-node-xpxxj,aws-node-znsrt,aws-node-zvddg,coredns-955588fc4-46krq,coredns-955588fc4-lf7pb,external-dns-79dd4f7cf5-v9hw2,grafana-5494847df5-vd2d9,kiam-agent-26kqr,kiam-agent-6xdp8,kiam-agent-bwfsk,kiam-agent-lj9hc,kiam-server-77djh,kiam-server-ps7qj,kube-proxy-827bf,kube-proxy-gm8tl,kube-proxy-hd7dl,kube-proxy-hst76,kube-proxy-qcp26,kube-proxy-z7bjh,kube-state-metrics-5975c6f6dc-qx7w4,kubernetes-d/metrics/memory/usage
pods in twistlock: twistlock-console-central-b99d5f656-dhh4b,twistlock-console-supervisor1-587c86f876-z77nx,twistlock-console-supervisor2-844d988d79-xsx4p,twistlock-console-supervisor3-5669d4797d-ltcfg,twistlock-console-supervisor4-7b4479ddb5-kbcdm,twistlock-console-supervisor5-565467d7d4-hxdsx,twistlock-console-supervisor6-5cd756f5c9-rsn9k,twistlock-defender-ds-4xwd6,twistlock-defender-ds-g7hb9,twistlock-defender-ds-h6gtz,twistlock-defender-ds-mm8cv,twistlock-defender-ds-x26dg,twistlock-defender-ds-xmnxs,twistlock-defender-supervisor1-6bd65ff946-nnk99,twistlock-defender-supervisor2-7c9d84c9d-9544w,twistlock-defender-supervisor3-845f9db9c4-wdswh,twistlock-defender-supervisor4-8576b48cf8-4bc4k,twistlock-defender-supervisor5-5cf95f85d6-z7jqs,twistlock-defender-supervisor6-5976f5c586-88lgt
Request timed out with len(pod-list)=449 len(uri)=610
http://localhost:8001/api/v1/namespaces/kube-system/services/dashboard-metrics-scraper/proxy/api/v1/dashboard/namespaces/twistlock/pod-list/twistlock-console-central-b99d5f656-dhh4b,twistlock-console-supervisor1-587c86f876-z77nx,twistlock-console-supervisor2-844d988d79-xsx4p,twistlock-console-supervisor3-5669d4797d-ltcfg,twistlock-console-supervisor4-7b4479ddb5-kbcdm,twistlock-console-supervisor5-565467d7d4-hxdsx,twistlock-console-supervisor6-5cd756f5c9-rsn9k,twistlock-defender-ds-4xwd6,twistlock-defender-ds-g7hb9,twistlock-defender-ds-h6gtz,twistlock-defender-ds-mm8cv,twistlock-def/metrics/memory/usage
Tested from within the cluster and the same thing occurs. Additionally, the timeout occurs even if I should hit one of the default handlers, tested by changing /api/ to /apx/.
OK:
http://dashboard-metrics-scraper:8000/api/v1/dashboard/namespaces/kube-system/pod-list/aws-node-6jgtg,aws-node-nkknm,aws-node-q9sch,aws-node-xpxxj,aws-node-znsrt,aws-node-zvddg,coredns-955588fc4-46krq,coredns-955588fc4-lf7pb,external-dns-79dd4f7cf5-v9hw2,grafana-5494847df5-vd2d9,kiam-agent-26kqr,kiam-agent-6xdp8,kiam-agent-bwfsk,kiam-agent-lj9hc,kiam-server-77djh,kiam-server-ps7qj,kube-proxy-827bf,kube-proxy-gm8tl,kube-proxy-hd7dl,kube-proxy-hst76,kube-proxy-qcp26,kube-proxy-z7bjh,kube-state-metrics-5975c6f6dc-qx7w4,kubernetes-/metrics/memory/usage
OK:
http://dashboard-metrics-scraper.kube-system:8000/api/v1/dashboard/namespaces/kube-system/pod-list/aws-node-6jgtg,aws-node-nkknm,aws-node-q9sch,aws-node-xpxxj,aws-node-znsrt,aws-node-zvddg,coredns-955588fc4-46krq,coredns-955588fc4-lf7pb,external-dns-79dd4f7cf5-v9hw2,grafana-5494847df5-vd2d9,kiam-agent-26kqr,kiam-agent-6xdp8,kiam-agent-bwfsk,kiam-agent-lj9hc,kiam-server-77djh,kiam-server-ps7qj,kube-proxy-827bf,kube-proxy-gm8tl,kube-proxy-hd7dl,kube-proxy-hst76,kube-proxy-qcp26,kube-proxy-z7bjh,kube-state-metrics-5975c6f6dc-qx7w4,kubernetes-/metrics/memory/usage
Timeout:
http://dashboard-metrics-scraper:8000/api/v1/dashboard/namespaces/kube-system/pod-list/aws-node-6jgtg,aws-node-nkknm,aws-node-q9sch,aws-node-xpxxj,aws-node-znsrt,aws-node-zvddg,coredns-955588fc4-46krq,coredns-955588fc4-lf7pb,external-dns-79dd4f7cf5-v9hw2,grafana-5494847df5-vd2d9,kiam-agent-26kqr,kiam-agent-6xdp8,kiam-agent-bwfsk,kiam-agent-lj9hc,kiam-server-77djh,kiam-server-ps7qj,kube-proxy-827bf,kube-proxy-gm8tl,kube-proxy-hd7dl,kube-proxy-hst76,kube-proxy-qcp26,kube-proxy-z7bjh,kube-state-metrics-5975c6f6dc-qx7w4,kubernetes-d/metrics/memory/usage
Timeout:
http://dashboard-metrics-scraper:8000/apx/v1/dashboard/namespaces/kube-system/pod-list/aws-node-6jgtg,aws-node-nkknm,aws-node-q9sch,aws-node-xpxxj,aws-node-znsrt,aws-node-zvddg,coredns-955588fc4-46krq,coredns-955588fc4-lf7pb,external-dns-79dd4f7cf5-v9hw2,grafana-5494847df5-vd2d9,kiam-agent-26kqr,kiam-agent-6xdp8,kiam-agent-bwfsk,kiam-agent-lj9hc,kiam-server-77djh,kiam-server-ps7qj,kube-proxy-827bf,kube-proxy-gm8tl,kube-proxy-hd7dl,kube-proxy-hst76,kube-proxy-qcp26,kube-proxy-z7bjh,kube-state-metrics-5975c6f6dc-qx7w4,kubernetes-d/metrics/memory/usage
Disregard - turns out this was caused by recent changes to the configuration of a security agent on the hosts. The agent claimed it was resetting the connection due to "URI Path Length Too Long", and then dropping all traffic after the reset due to "Packet on Closed Connection". Disabling the agent has resolved all issues.
I'm trying to isolate an issue with the pod list in the dashboard not loading. The spinner just cycles forever. If I look at the browser's network trace, it appears that the request to retrieve pod usage statistics is hanging.
The k8s apiserver audit log says the request is failing with a 503:
I've enabled debug-level logging in the scraper and I don't even see the request getting logged at all.
A few additional notes: