Closed foot closed 1 year ago
How to debug this?
Sounds like we will have to hop on a debugging call with PEPSI.
Pre-debug call:
e.g. here is demo-01 which is upgraded every day:
❯ kubectl get pods -n flux-system
NAME READY STATUS RESTARTS AGE
flux-system-tf-controller-694c8c7bdc-877vh 1/1 Running 24 (5d23h ago) 10d
gitopssets-controller-manager-5cf4c4f766-llhgc 2/2 Running 11 (5d23h ago) 7d1h
helm-controller-7c8bd45cb4-d2rzp 1/1 Running 21 (5d23h ago) 10d
kustomize-controller-7bbf4fcd8b-zk44v 1/1 Running 22 (5d23h ago) 10d
notification-controller-f68ff48f8-lkwhx 1/1 Running 22 (5d23h ago) 10d
policy-agent-64dc8c99bc-zqxv4 1/1 Running 825 (7d2h ago) 10d
source-controller-6cbcbd869f-d6mm2 1/1 Running 22 (5d23h ago) 10d
templates-controller-controller-manager-7875f58d5f-qsdfk 2/2 Running 22 (5d23h ago) 10d
weave-gitops-enterprise-cluster-controller-5b4f964cb-czctc 2/2 Running 23 (5d23h ago) 10d
weave-gitops-enterprise-mccp-cluster-bootstrap-controller-vgkj9 2/2 Running 6 (5d23h ago) 6d22h
weave-gitops-enterprise-mccp-cluster-service-764fc44b4c-cpv2z 1/1 Running 0 3d1h
weave-gitops-enterprise-pipeline-controller-557b8795c6-t28tw 1/1 Running 23 (5d23h ago) 10d
❯ kubectl get deploy -n flux-system
NAME READY UP-TO-DATE AVAILABLE AGE
flux-system-tf-controller 1/1 1 1 111d
gitopssets-controller-manager 1/1 1 1 25d
helm-controller 1/1 1 1 316d
kustomize-controller 1/1 1 1 316d
notification-controller 1/1 1 1 316d
policy-agent 1/1 1 1 195d
source-controller 1/1 1 1 316d
templates-controller-controller-manager 1/1 1 1 82d
weave-gitops-enterprise-cluster-controller 1/1 1 1 316d
weave-gitops-enterprise-mccp-cluster-bootstrap-controller 1/1 1 1 228d
weave-gitops-enterprise-mccp-cluster-service 1/1 1 1 228d
weave-gitops-enterprise-pipeline-controller 1/1 1 1 114d
❯ kubectl get rs -n flux-system
NAME DESIRED CURRENT READY AGE
flux-system-tf-controller-694c8c7bdc 1 1 1 111d
gitopssets-controller-manager-5b88444cb8 0 0 0 25d
gitopssets-controller-manager-5cf4c4f766 1 1 1 7d1h
gitopssets-controller-manager-dc86bd5c9 0 0 0 10d
helm-controller-565bb7b89b 0 0 0 316d
helm-controller-7bf446cfbd 0 0 0 61d
helm-controller-7c8bd45cb4 1 1 1 53d
helm-controller-88f6889c6 0 0 0 290d
helm-controller-f47b55c65 0 0 0 111d
kustomize-controller-546bc764cb 0 0 0 111d
kustomize-controller-69668f89f 0 0 0 34d
kustomize-controller-784bd54978 0 0 0 290d
kustomize-controller-78d9fc5f5f 0 0 0 316d
kustomize-controller-7bbf4fcd8b 1 1 1 53d
kustomize-controller-7fc68b65f8 0 0 0 61d
notification-controller-55f485bf49 0 0 0 111d
notification-controller-56464fd84 0 0 0 316d
notification-controller-648bbb9db7 0 0 0 290d
notification-controller-6d9d866fb 0 0 0 61d
notification-controller-f68ff48f8 1 1 1 53d
policy-agent-5d8955b6cb 0 0 0 151d
policy-agent-64dc8c99bc 1 1 1 20d
policy-agent-745687c5c6 0 0 0 82d
policy-agent-75884f59bb 0 0 0 117d
policy-agent-86c6cbc77d 0 0 0 172d
policy-agent-c8f78fcc4 0 0 0 123d
policy-agent-c98f54fbd 0 0 0 195d
source-controller-58757cc77b 0 0 0 34d
source-controller-6cbcbd869f 1 1 1 53d
source-controller-7646dcd7c7 0 0 0 111d
source-controller-79f7866bc7 0 0 0 290d
source-controller-7ddfb8f554 0 0 0 316d
source-controller-df76fdcf8 0 0 0 61d
templates-controller-controller-manager-6b69c94566 0 0 0 82d
templates-controller-controller-manager-7875f58d5f 1 1 1 52d
weave-gitops-enterprise-cluster-controller-5458cbbc48 0 0 0 298d
weave-gitops-enterprise-cluster-controller-5b4f964cb 1 1 1 137d
weave-gitops-enterprise-cluster-controller-5d47765864 0 0 0 158d
weave-gitops-enterprise-cluster-controller-648b88cc8f 0 0 0 255d
weave-gitops-enterprise-cluster-controller-6d87cd55b6 0 0 0 217d
weave-gitops-enterprise-cluster-controller-764ccff568 0 0 0 171d
weave-gitops-enterprise-cluster-controller-77fffd5749 0 0 0 219d
weave-gitops-enterprise-cluster-controller-79588bfc47 0 0 0 316d
weave-gitops-enterprise-mccp-cluster-bootstrap-controller-589d666b5d 0 0 0 171d
weave-gitops-enterprise-mccp-cluster-bootstrap-controller-7664f847fb 1 1 1 6d22h
weave-gitops-enterprise-mccp-cluster-bootstrap-controller-768df9cb58 0 0 0 202d
weave-gitops-enterprise-mccp-cluster-bootstrap-controller-7f76c8b68c 0 0 0 228d
weave-gitops-enterprise-mccp-cluster-service-5458fb8585 0 0 0 10d
weave-gitops-enterprise-mccp-cluster-service-55976cf78f 0 0 0 10d
weave-gitops-enterprise-mccp-cluster-service-56d95df775 0 0 0 3d21h
weave-gitops-enterprise-mccp-cluster-service-59db56cb9f 0 0 0 6d20h
weave-gitops-enterprise-mccp-cluster-service-5bcf69bccf 0 0 0 6d22h
weave-gitops-enterprise-mccp-cluster-service-64bb66fcd5 0 0 0 10d
weave-gitops-enterprise-mccp-cluster-service-67dbd9bd95 0 0 0 4d1h
weave-gitops-enterprise-mccp-cluster-service-699d8c85dc 0 0 0 7d1h
weave-gitops-enterprise-mccp-cluster-service-764fc44b4c 1 1 1 3d1h
weave-gitops-enterprise-mccp-cluster-service-8447ddf8f7 0 0 0 10d
weave-gitops-enterprise-mccp-cluster-service-8f498f747 0 0 0 4d18h
weave-gitops-enterprise-pipeline-controller-557b8795c6 1 1 1 12d
weave-gitops-enterprise-pipeline-controller-595cc68556 0 0 0 67d
weave-gitops-enterprise-pipeline-controller-679b55bfc4 0 0 0 17d
weave-gitops-enterprise-pipeline-controller-7fb7d95747 0 0 0 59d
weave-gitops-enterprise-pipeline-controller-85bb4db544 0 0 0 60d
weave-gitops-enterprise-pipeline-controller-c6566c866 0 0 0 114d
weave-gitops-enterprise-pipeline-controller-d9dc4dbb9 0 0 0 94d
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#revision-history-limit
By default, 10 old ReplicaSets will be kept
To allow rollbacks and things. Maybe this should go in a "gitops best practices" guide to set this to 0. Anyway, lingering replicasets is fine.
Sounds like we will have to hop on a debugging call with PEPSI.
The upgrade sounds to have not gone well. We should check
cc @bigkevmcd anything else come to mind here?
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#revision-history-limit
By default, 10 old ReplicaSets will be kept
To allow rollbacks and things. Maybe this should go in a "gitops best practices" guide to set this to 0. Anyway, lingering replicasets is fine.
https://www.weave.works/blog/how-many-kubernetes-replicasets-are-in-your-cluster-
Do we know how they're upgrading?
Editing the HelmRelease in Git?
Do we know how they're upgrading?
Good shout, I don't know.
Looking at the charts, I notice that we have...
and
and
We need to make sure that these match, along with https://github.com/weaveworks/weave-gitops-enterprise/blob/main/charts/mccp/templates/clusters-service/service.yaml#L13
I remember vaguely that @bigkevmcd bumped into some pod/service selector issue in WGE a while back when debugging logs or something but I can't remember the details.
I think it may have been raised and fixed in https://github.com/weaveworks/weave-gitops-enterprise/pull/1171/files
I don't see any lingering pods, cleaning the list you can see this...it looks like there's only one running.
❯ kubectl get rs -n flux-system
NAME DESIRED CURRENT READY AGE
flux-system-tf-controller-694c8c7bdc 1 1 1 111d
gitopssets-controller-manager-5cf4c4f766 1 1 1 7d1h
helm-controller-7c8bd45cb4 1 1 1 53d
kustomize-controller-7bbf4fcd8b 1 1 1 53d
notification-controller-f68ff48f8 1 1 1 53d
policy-agent-64dc8c99bc 1 1 1 20d
source-controller-6cbcbd869f 1 1 1 53d
templates-controller-controller-manager-7875f58d5f 1 1 1 52d
weave-gitops-enterprise-cluster-controller-5b4f964cb 1 1 1 137d
weave-gitops-enterprise-mccp-cluster-bootstrap-controller-7664f847fb 1 1 1 6d22h
weave-gitops-enterprise-mccp-cluster-service-764fc44b4c 1 1 1 3d1h
weave-gitops-enterprise-pipeline-controller-557b8795c6 1 1 1 12d
I don't see any lingering pods, cleaning the list you can see this...it looks like there's only one running.
Yeah, sorry that output is confusingly from demo-01 just to highlight the replicasets not getting cleaned up, not from PEPSI.
This has happened them twice now after upgrading (everytime?)
See new pods, old pods are not cleaned up. UI stops working.
Questions:
Possibilities?