Closed DanielFroehlich closed 5 months ago
Its not really getting better, now we have two kube-apiserver pods >20 cores. May I kill one of those and see what happens? @rbo , wdyt?
oc adm top pod --sum --sort-by=cpu
NAME CPU(cores) MEMORY(bytes)
kube-apiserver-86548cbbbf-kzqgq 18906m 4364Mi
kube-apiserver-86548cbbbf-64q6x 15647m 2307Mi
virt-launcher-sendling-ff7bf3fd-pn5vs-q8w7v 6723m 16267Mi
kube-apiserver-86548cbbbf-bhsfj 3785m 3789Mi
etcd-0 2127m 619Mi
ignition-server-5b4567866-kl4xf 1014m 414Mi
ignition-server-5b4567866-w568x 1013m 334Mi
etcd-1 988m 878Mi
ignition-server-5b4567866-b7ksx 892m 372Mi
olm-operator-54d975b8c4-xtlmk 474m 404Mi
etcd-2 303m 504Mi
kube-controller-manager-5fc9d96bdf-d2mgv 119m 296Mi
control-plane-operator-646886cd59-knl2g 77m 346Mi
redhat-operators-catalog-5778b9f69d-9ppd4 19m 86Mi
community-operators-catalog-79c49fb477-krzt8 18m 147Mi
certified-operators-catalog-5f585b98cd-5xz5z 17m 142Mi
redhat-marketplace-catalog-867f99df5-grcnn 15m 68Mi
openshift-apiserver-778f9db75c-6pjb5 15m 321Mi
openshift-apiserver-778f9db75c-zjqcj 14m 311Mi
openshift-apiserver-778f9db75c-kc7m9 13m 227Mi
virt-launcher-sendling-10d195e8-j8czd-sjshq 13m 1495Mi
openshift-route-controller-manager-869c7c988b-xrncf 10m 60Mi
cluster-policy-controller-74964cb9d6-6svtg 10m 191Mi
cluster-network-operator-9664cbc94-mcmng 9m 281Mi
packageserver-7f976d7855-brh6g 8m 277Mi
hosted-cluster-config-operator-748dbf6695-6kng2 8m 156Mi
openshift-oauth-apiserver-584f69b7f9-7c2mp 7m 69Mi
openshift-oauth-apiserver-584f69b7f9-f4jmm 7m 102Mi
openshift-controller-manager-5578f894bb-7dd7p 6m 174Mi
openshift-oauth-apiserver-584f69b7f9-9qd99 5m 85Mi
machine-approver-85cb867c5-hdr2z 5m 97Mi
cluster-storage-operator-7fcdf884fb-8t2mf 5m 80Mi
packageserver-7f976d7855-tv5pb 5m 300Mi
kube-controller-manager-5fc9d96bdf-7wjrs 4m 57Mi
openshift-route-controller-manager-869c7c988b-8rjtd 4m 76Mi
cluster-api-c7b575bb4-zg64v 3m 101Mi
konnectivity-agent-8599fd5d6b-sr6pn 3m 44Mi
capi-provider-7f58f475dd-h68hg 3m 70Mi
openshift-controller-manager-5578f894bb-4c9vn 3m 61Mi
konnectivity-agent-8599fd5d6b-2brl7 2m 35Mi
packageserver-7f976d7855-bp87j 2m 270Mi
kube-scheduler-5b5c9478f4-kpq4l 2m 93Mi
kubevirt-cloud-controller-manager-974969547-9kq6w 2m 67Mi
openshift-route-controller-manager-869c7c988b-kp2kh 2m 77Mi
ingress-operator-7878df55d7-mq7cm 2m 199Mi
multus-admission-controller-568bb6cd65-8wr2v 2m 95Mi
kube-scheduler-5b5c9478f4-7rdng 2m 51Mi
csi-snapshot-controller-operator-797bf595d9-n5csv 2m 80Mi
oauth-openshift-6b8fc486c9-q8d5m 2m 82Mi
kube-controller-manager-5fc9d96bdf-5x4lp 2m 60Mi
oauth-openshift-6b8fc486c9-gl7s4 2m 106Mi
ignition-server-proxy-57c4f77c97-6qn9r 1m 115Mi
ovnkube-control-plane-75bffb695c-dq6fc 1m 143Mi
ignition-server-proxy-57c4f77c97-8n7f5 1m 135Mi
catalog-operator-74c748d567-2h9h8 1m 366Mi
cluster-autoscaler-679d6fbdf6-p87xg 1m 119Mi
cluster-image-registry-operator-69754bbcc9-kchgs 1m 143Mi
openshift-controller-manager-5578f894bb-6rkvb 1m 65Mi
cluster-policy-controller-74964cb9d6-6pqmj 1m 35Mi
kube-scheduler-5b5c9478f4-98vnz 1m 62Mi
ignition-server-proxy-57c4f77c97-frg8q 1m 107Mi
cluster-policy-controller-74964cb9d6-v2jnb 1m 37Mi
network-node-identity-8495fd79d9-sxnw9 0m 98Mi
kubevirt-csi-controller-54d7884b4b-r8rtj 0m 142Mi
cluster-version-operator-55f8dfbdd7-k2g4c 0m 167Mi
ovnkube-control-plane-75bffb695c-bgm4m 0m 112Mi
ovnkube-control-plane-75bffb695c-bxpzh 0m 106Mi
csi-snapshot-controller-7cdd696bfd-hczrf 0m 35Mi
network-node-identity-8495fd79d9-sqkbl 0m 102Mi
csi-snapshot-webhook-7c66684757-br56b 0m 36Mi
cluster-node-tuning-operator-58764957cb-bhjwc 0m 67Mi
network-node-identity-8495fd79d9-zw2r5 0m 159Mi
dns-operator-5d6f5c64b9-2w8hn 0m 52Mi
konnectivity-agent-8599fd5d6b-f8nkz 0m 28Mi
oauth-openshift-6b8fc486c9-bgkpf 0m 79Mi
________ ________
52332m 39738Mi
Try to fix it very quick:
oc delete pod --wait=false kube-apiserver-86548cbbbf-kzqgq kube-apiserver-86548cbbbf-64q6x
oc adm top pod --sum --sort-by=cpu -n rbohne-hcp-sendling
NAME CPU(cores) MEMORY(bytes)
kube-apiserver-86548cbbbf-r6grl 18262m 2289Mi
kube-apiserver-86548cbbbf-btmct 15798m 2114Mi
virt-launcher-sendling-ff7bf3fd-pn5vs-q8w7v 6153m 16178Mi
kube-apiserver-86548cbbbf-bhsfj 5517m 4611Mi
etcd-0 2303m 616Mi
openshift-apiserver-778f9db75c-6pjb5 1254m 333Mi
etcd-1 1109m 861Mi
ignition-server-5b4567866-kl4xf 1064m 414Mi
ignition-server-5b4567866-w568x 1017m 333Mi
ignition-server-5b4567866-b7ksx 990m 372Mi
olm-operator-54d975b8c4-xtlmk 800m 404Mi
hosted-cluster-config-operator-748dbf6695-6kng2 537m 151Mi
etcd-2 338m 536Mi
konnectivity-agent-8599fd5d6b-sr6pn 272m 45Mi
openshift-apiserver-778f9db75c-kc7m9 241m 272Mi
konnectivity-agent-8599fd5d6b-f8nkz 199m 29Mi
openshift-oauth-apiserver-584f69b7f9-f4jmm 182m 117Mi
openshift-oauth-apiserver-584f69b7f9-9qd99 167m 96Mi
kube-controller-manager-5fc9d96bdf-d2mgv 153m 300Mi
packageserver-7f976d7855-brh6g 89m 289Mi
packageserver-7f976d7855-bp87j 79m 259Mi
openshift-oauth-apiserver-584f69b7f9-7c2mp 69m 82Mi
control-plane-operator-646886cd59-knl2g 50m 338Mi
redhat-operators-catalog-5778b9f69d-9ppd4 21m 84Mi
certified-operators-catalog-5f585b98cd-5xz5z 17m 133Mi
redhat-marketplace-catalog-867f99df5-grcnn 16m 68Mi
openshift-apiserver-778f9db75c-zjqcj 16m 325Mi
community-operators-catalog-79c49fb477-krzt8 15m 143Mi
openshift-route-controller-manager-869c7c988b-xrncf 15m 60Mi
packageserver-7f976d7855-tv5pb 14m 258Mi
virt-launcher-sendling-10d195e8-j8czd-sjshq 13m 1495Mi
cluster-network-operator-9664cbc94-mcmng 10m 286Mi
machine-approver-85cb867c5-hdr2z 10m 99Mi
cluster-policy-controller-74964cb9d6-6svtg 8m 192Mi
openshift-controller-manager-5578f894bb-4c9vn 6m 60Mi
openshift-controller-manager-5578f894bb-7dd7p 6m 175Mi
kube-controller-manager-5fc9d96bdf-7wjrs 5m 60Mi
kube-scheduler-5b5c9478f4-7rdng 5m 51Mi
capi-provider-7f58f475dd-h68hg 4m 70Mi
cluster-storage-operator-7fcdf884fb-8t2mf 4m 81Mi
openshift-route-controller-manager-869c7c988b-8rjtd 4m 76Mi
cluster-api-c7b575bb4-zg64v 3m 101Mi
ovnkube-control-plane-75bffb695c-dq6fc 3m 143Mi
ignition-server-proxy-57c4f77c97-6qn9r 2m 115Mi
csi-snapshot-controller-operator-797bf595d9-n5csv 2m 81Mi
multus-admission-controller-568bb6cd65-8wr2v 2m 93Mi
catalog-operator-74c748d567-2h9h8 2m 313Mi
kube-controller-manager-5fc9d96bdf-5x4lp 2m 60Mi
openshift-route-controller-manager-869c7c988b-kp2kh 2m 78Mi
kube-scheduler-5b5c9478f4-kpq4l 2m 93Mi
oauth-openshift-6b8fc486c9-gl7s4 2m 106Mi
oauth-openshift-6b8fc486c9-q8d5m 2m 82Mi
ingress-operator-7878df55d7-mq7cm 2m 193Mi
cluster-autoscaler-679d6fbdf6-p87xg 1m 120Mi
ignition-server-proxy-57c4f77c97-8n7f5 1m 135Mi
cluster-image-registry-operator-69754bbcc9-kchgs 1m 141Mi
cluster-policy-controller-74964cb9d6-v2jnb 1m 37Mi
openshift-controller-manager-5578f894bb-6rkvb 1m 62Mi
kube-scheduler-5b5c9478f4-98vnz 1m 63Mi
ignition-server-proxy-57c4f77c97-frg8q 1m 107Mi
oauth-openshift-6b8fc486c9-bgkpf 1m 79Mi
kubevirt-cloud-controller-manager-974969547-9kq6w 1m 67Mi
csi-snapshot-webhook-7c66684757-br56b 0m 36Mi
cluster-version-operator-55f8dfbdd7-k2g4c 0m 169Mi
csi-snapshot-controller-7cdd696bfd-hczrf 0m 35Mi
ovnkube-control-plane-75bffb695c-bgm4m 0m 112Mi
ovnkube-control-plane-75bffb695c-bxpzh 0m 106Mi
konnectivity-agent-8599fd5d6b-2brl7 0m 34Mi
dns-operator-5d6f5c64b9-2w8hn 0m 53Mi
cluster-policy-controller-74964cb9d6-6pqmj 0m 35Mi
cluster-node-tuning-operator-58764957cb-bhjwc 0m 73Mi
kubevirt-csi-controller-54d7884b4b-r8rtj 0m 142Mi
network-node-identity-8495fd79d9-zw2r5 0m 160Mi
network-node-identity-8495fd79d9-sxnw9 0m 98Mi
network-node-identity-8495fd79d9-sqkbl 0m 102Mi
________ ________
56867m 38218Mi
Nothing changed..
When you identified the root cause: I think it would also be good to have some limits in place, to prevent this from consuming so many resources.
On Mon, Jun 10, 2024 at 1:35 PM Robert Bohne @.***> wrote:
oc adm top pod --sum --sort-by=cpu -n rbohne-hcp-sendling NAME CPU(cores) MEMORY(bytes) kube-apiserver-86548cbbbf-r6grl 18262m 2289Mi kube-apiserver-86548cbbbf-btmct 15798m 2114Mi virt-launcher-sendling-ff7bf3fd-pn5vs-q8w7v 6153m 16178Mi kube-apiserver-86548cbbbf-bhsfj 5517m 4611Mi etcd-0 2303m 616Mi openshift-apiserver-778f9db75c-6pjb5 1254m 333Mi etcd-1 1109m 861Mi ignition-server-5b4567866-kl4xf 1064m 414Mi ignition-server-5b4567866-w568x 1017m 333Mi ignition-server-5b4567866-b7ksx 990m 372Mi olm-operator-54d975b8c4-xtlmk 800m 404Mi hosted-cluster-config-operator-748dbf6695-6kng2 537m 151Mi etcd-2 338m 536Mi konnectivity-agent-8599fd5d6b-sr6pn 272m 45Mi openshift-apiserver-778f9db75c-kc7m9 241m 272Mi konnectivity-agent-8599fd5d6b-f8nkz 199m 29Mi openshift-oauth-apiserver-584f69b7f9-f4jmm 182m 117Mi openshift-oauth-apiserver-584f69b7f9-9qd99 167m 96Mi kube-controller-manager-5fc9d96bdf-d2mgv 153m 300Mi packageserver-7f976d7855-brh6g 89m 289Mi packageserver-7f976d7855-bp87j 79m 259Mi openshift-oauth-apiserver-584f69b7f9-7c2mp 69m 82Mi control-plane-operator-646886cd59-knl2g 50m 338Mi redhat-operators-catalog-5778b9f69d-9ppd4 21m 84Mi certified-operators-catalog-5f585b98cd-5xz5z 17m 133Mi redhat-marketplace-catalog-867f99df5-grcnn 16m 68Mi openshift-apiserver-778f9db75c-zjqcj 16m 325Mi community-operators-catalog-79c49fb477-krzt8 15m 143Mi openshift-route-controller-manager-869c7c988b-xrncf 15m 60Mi packageserver-7f976d7855-tv5pb 14m 258Mi virt-launcher-sendling-10d195e8-j8czd-sjshq 13m 1495Mi cluster-network-operator-9664cbc94-mcmng 10m 286Mi machine-approver-85cb867c5-hdr2z 10m 99Mi cluster-policy-controller-74964cb9d6-6svtg 8m 192Mi openshift-controller-manager-5578f894bb-4c9vn 6m 60Mi openshift-controller-manager-5578f894bb-7dd7p 6m 175Mi kube-controller-manager-5fc9d96bdf-7wjrs 5m 60Mi kube-scheduler-5b5c9478f4-7rdng 5m 51Mi capi-provider-7f58f475dd-h68hg 4m 70Mi cluster-storage-operator-7fcdf884fb-8t2mf 4m 81Mi openshift-route-controller-manager-869c7c988b-8rjtd 4m 76Mi cluster-api-c7b575bb4-zg64v 3m 101Mi ovnkube-control-plane-75bffb695c-dq6fc 3m 143Mi ignition-server-proxy-57c4f77c97-6qn9r 2m 115Mi csi-snapshot-controller-operator-797bf595d9-n5csv 2m 81Mi multus-admission-controller-568bb6cd65-8wr2v 2m 93Mi catalog-operator-74c748d567-2h9h8 2m 313Mi kube-controller-manager-5fc9d96bdf-5x4lp 2m 60Mi openshift-route-controller-manager-869c7c988b-kp2kh 2m 78Mi kube-scheduler-5b5c9478f4-kpq4l 2m 93Mi oauth-openshift-6b8fc486c9-gl7s4 2m 106Mi oauth-openshift-6b8fc486c9-q8d5m 2m 82Mi ingress-operator-7878df55d7-mq7cm 2m 193Mi cluster-autoscaler-679d6fbdf6-p87xg 1m 120Mi ignition-server-proxy-57c4f77c97-8n7f5 1m 135Mi cluster-image-registry-operator-69754bbcc9-kchgs 1m 141Mi cluster-policy-controller-74964cb9d6-v2jnb 1m 37Mi openshift-controller-manager-5578f894bb-6rkvb 1m 62Mi kube-scheduler-5b5c9478f4-98vnz 1m 63Mi ignition-server-proxy-57c4f77c97-frg8q 1m 107Mi oauth-openshift-6b8fc486c9-bgkpf 1m 79Mi kubevirt-cloud-controller-manager-974969547-9kq6w 1m 67Mi csi-snapshot-webhook-7c66684757-br56b 0m 36Mi cluster-version-operator-55f8dfbdd7-k2g4c 0m 169Mi csi-snapshot-controller-7cdd696bfd-hczrf 0m 35Mi ovnkube-control-plane-75bffb695c-bgm4m 0m 112Mi ovnkube-control-plane-75bffb695c-bxpzh 0m 106Mi konnectivity-agent-8599fd5d6b-2brl7 0m 34Mi dns-operator-5d6f5c64b9-2w8hn 0m 53Mi cluster-policy-controller-74964cb9d6-6pqmj 0m 35Mi cluster-node-tuning-operator-58764957cb-bhjwc 0m 73Mi kubevirt-csi-controller-54d7884b4b-r8rtj 0m 142Mi network-node-identity-8495fd79d9-zw2r5 0m 160Mi network-node-identity-8495fd79d9-sxnw9 0m 98Mi network-node-identity-8495fd79d9-sqkbl 0m 102Mi
56867m 38218Mi
Nothing changed..
— Reply to this email directly, view it on GitHub https://github.com/stormshift/support/issues/175#issuecomment-2158102097, or unsubscribe https://github.com/notifications/unsubscribe-auth/AENMQWZXGENQ7VBZQKQLVMLZGWFODAVCNFSM6AAAAABILWIEN2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJYGEYDEMBZG4 . You are receiving this because you authored the thread.Message ID: @.***>
-- Daniel Fröhlich (He/Him/His) - Principal Product Manager OpenShift Edge
Red Hat GmbH - Registered seat: Werner von Siemens Ring 12, D-85630 Grasbrunn, Germany - Commercial register: Amtsgericht Muenchen/Munich, HRB 153243 - Managing Directors: Ryan Barnhart, Charles Cachera, Michael O'Neill, Amy Ross
@DanielFroehlich Agree!
Looks like a upgrade is running:
Red Hat SSO config is broken on that cluster. Let's fix this first.
Update stuck because one of the two nodes stuck joining:
Try to fetch from the pod where the VM is running:
sh-5.1$ curl https://ignition-server-rbohne-hcp-sendling.apps.isar.coe.muc.redhat.com/ignition
curl: (6) Could not resolve host: ignition-server-rbohne-hcp-sendling.apps.isar.coe.muc.redhat.com
sh-5.1$
$ oc rsh virt-launcher-sendling-ff7bf3fd-pn5vs-q8w7v
sh-5.1$ curl -kvvv https://ignition-server-rbohne-hcp-sendling.apps.isar.coe.muc.redhat.com/ignition
* Could not resolve host: ignition-server-rbohne-hcp-sendling.apps.isar.coe.muc.redhat.com
* Closing connection 0
curl: (6) Could not resolve host: ignition-server-rbohne-hcp-sendling.apps.isar.coe.muc.redhat.com
sh-5.1$
The other vm pod can not curl as well..
Sorry noo time anymore and sendling is not importend. Deleted. Problem solved :-/
Documented at the wrong issue:
Try the curl from another pod on the same node:
$ oc project rbohne-hcp-sendling
Now using project "rbohne-hcp-sendling" on server "https://api.isar.coe.muc.redhat.com:6443".
$ oc get pods -o wide | grep sendling-10d195e8-j8czd
virt-launcher-sendling-10d195e8-j8czd-sjshq 1/1 Running 0 11d 10.128.10.152 inf44 <none> 1/1
$ oc rsh virt-launcher-sendling-10d195e8-j8czd-sjshq
sh-5.1$ curl -k https://ignition-server-rbohne-hcp-sendling.apps.isar.coe.muc.redhat.com/ignition
curl: (6) Could not resolve host: ignition-server-rbohne-hcp-sendling.apps.isar.coe.muc.redhat.com
sh-5.1$ exit
exit
command terminated with exit code 6
$ oc get pods -o wide | grep inf44
cluster-policy-controller-74964cb9d6-6pqmj 1/1 Running 6 (9d ago) 14d 10.128.10.55 inf44 <none> <none>
etcd-1 3/3 Running 4 (13d ago) 13d 10.128.10.17 inf44 <none> <none>
ignition-server-5b4567866-b7ksx 1/1 Running 0 14d 10.128.10.56 inf44 <none> <none>
ignition-server-proxy-57c4f77c97-6qn9r 1/1 Running 0 14d 10.128.10.57 inf44 <none> <none>
konnectivity-agent-8599fd5d6b-f8nkz 1/1 Running 0 14d 10.128.10.58 inf44 <none> <none>
kube-apiserver-86548cbbbf-bhsfj 4/4 Running 0 14d 10.128.10.29 inf44 <none> <none>
kube-controller-manager-5fc9d96bdf-5x4lp 1/1 Running 6 (9d ago) 14d 10.128.10.59 inf44 <none> <none>
kube-scheduler-5b5c9478f4-98vnz 1/1 Running 1 (9d ago) 14d 10.128.10.51 inf44 <none> <none>
network-node-identity-8495fd79d9-sqkbl 3/3 Running 16 (13d ago) 14d 10.128.10.30 inf44 <none> <none>
oauth-openshift-6b8fc486c9-xb6s8 2/2 Running 0 10m 10.128.10.252 inf44 <none> <none>
openshift-apiserver-778f9db75c-6pjb5 3/3 Running 6 (9d ago) 14d 10.128.10.32 inf44 <none> <none>
openshift-controller-manager-5578f894bb-6rkvb 1/1 Running 0 14d 10.128.10.61 inf44 <none> <none>
openshift-oauth-apiserver-584f69b7f9-9qd99 2/2 Running 6 (9d ago) 14d 10.128.10.33 inf44 <none> <none>
openshift-route-controller-manager-869c7c988b-kp2kh 1/1 Running 10 (9d ago) 14d 10.128.10.60 inf44 <none> <none>
ovnkube-control-plane-75bffb695c-bxpzh 3/3 Running 17 (13d ago) 14d 10.128.10.34 inf44 <none> <none>
packageserver-7f976d7855-bp87j 2/2 Running 0 14d 10.128.10.35 inf44 <none> <none>
virt-launcher-sendling-10d195e8-j8czd-sjshq 1/1 Running 0 11d 10.128.10.152 inf44 <none> 1/1
$ oc rsh packageserver-7f976d7855-bp87j
Defaulted container "packageserver" out of: packageserver, socks5-proxy, availability-prober (init)
sh-4.4$ curl -k https://ignition-server-rbohne-hcp-sendling.apps.isar.coe.muc.redhat.com/ignition
Unauthorized
sh-4.4$ cat /etc/resolv.conf
search rbohne-hcp-sendling.svc.cluster.local svc.cluster.local cluster.local isar.coe.muc.redhat.com coe.muc.redhat.com
nameserver 172.30.0.10
options ndots:5
sh-4.4$
=> works, Unauthorized is expected.
Resolve conf is the same.
$ oc get pods -o yaml packageserver-7f976d7855-bp87j | grep -i dns
"dns": {}
dnsPolicy: ClusterFirst
$ oc get pods -o yaml virt-launcher-sendling-10d195e8-j8czd-sjshq | grep -i dns
"dns": {}
dnsPolicy: ClusterFirst
$
``
While playing with acm observability, I realised we have two HCP cluster where an API server pod is consuming >20cores. That does not feel right! See e.g. here:
From: https://console-openshift-console.apps.isar.coe.muc.redhat.com/k8s/ns/rbohne-hcp-sendling/replicasets/kube-apiserver-86548cbbbf/pods
Same for rbohne-hcp-sendling-ingress