weaveworks / weave-gitops-enterprise

This repo provides the enterprise level features for the weave-gitops product, including CAPI cluster creation and team workspaces.
https://docs.gitops.weave.works/
Apache License 2.0
160 stars 29 forks source link

[PEPSI] Lingering pods after upgrading version of WGE #2528

Closed foot closed 1 year ago

foot commented 1 year ago

This has happened them twice now after upgrading (everytime?)

  1. Install wge $someversion
  2. Upgrade to $newerversion

See new pods, old pods are not cleaned up. UI stops working.

Questions:

Possibilities?

foot commented 1 year ago

How to debug this?

foot commented 1 year ago

Sounds like we will have to hop on a debugging call with PEPSI.

Pre-debug call:

e.g. here is demo-01 which is upgraded every day:

❯ kubectl get pods -n flux-system
NAME                                                              READY   STATUS    RESTARTS         AGE
flux-system-tf-controller-694c8c7bdc-877vh                        1/1     Running   24 (5d23h ago)   10d
gitopssets-controller-manager-5cf4c4f766-llhgc                    2/2     Running   11 (5d23h ago)   7d1h
helm-controller-7c8bd45cb4-d2rzp                                  1/1     Running   21 (5d23h ago)   10d
kustomize-controller-7bbf4fcd8b-zk44v                             1/1     Running   22 (5d23h ago)   10d
notification-controller-f68ff48f8-lkwhx                           1/1     Running   22 (5d23h ago)   10d
policy-agent-64dc8c99bc-zqxv4                                     1/1     Running   825 (7d2h ago)   10d
source-controller-6cbcbd869f-d6mm2                                1/1     Running   22 (5d23h ago)   10d
templates-controller-controller-manager-7875f58d5f-qsdfk          2/2     Running   22 (5d23h ago)   10d
weave-gitops-enterprise-cluster-controller-5b4f964cb-czctc        2/2     Running   23 (5d23h ago)   10d
weave-gitops-enterprise-mccp-cluster-bootstrap-controller-vgkj9   2/2     Running   6 (5d23h ago)    6d22h
weave-gitops-enterprise-mccp-cluster-service-764fc44b4c-cpv2z     1/1     Running   0                3d1h
weave-gitops-enterprise-pipeline-controller-557b8795c6-t28tw      1/1     Running   23 (5d23h ago)   10d

❯ kubectl get deploy -n flux-system
NAME                                                        READY   UP-TO-DATE   AVAILABLE   AGE
flux-system-tf-controller                                   1/1     1            1           111d
gitopssets-controller-manager                               1/1     1            1           25d
helm-controller                                             1/1     1            1           316d
kustomize-controller                                        1/1     1            1           316d
notification-controller                                     1/1     1            1           316d
policy-agent                                                1/1     1            1           195d
source-controller                                           1/1     1            1           316d
templates-controller-controller-manager                     1/1     1            1           82d
weave-gitops-enterprise-cluster-controller                  1/1     1            1           316d
weave-gitops-enterprise-mccp-cluster-bootstrap-controller   1/1     1            1           228d
weave-gitops-enterprise-mccp-cluster-service                1/1     1            1           228d
weave-gitops-enterprise-pipeline-controller                 1/1     1            1           114d

❯ kubectl get rs -n flux-system
NAME                                                                   DESIRED   CURRENT   READY   AGE
flux-system-tf-controller-694c8c7bdc                                   1         1         1       111d
gitopssets-controller-manager-5b88444cb8                               0         0         0       25d
gitopssets-controller-manager-5cf4c4f766                               1         1         1       7d1h
gitopssets-controller-manager-dc86bd5c9                                0         0         0       10d
helm-controller-565bb7b89b                                             0         0         0       316d
helm-controller-7bf446cfbd                                             0         0         0       61d
helm-controller-7c8bd45cb4                                             1         1         1       53d
helm-controller-88f6889c6                                              0         0         0       290d
helm-controller-f47b55c65                                              0         0         0       111d
kustomize-controller-546bc764cb                                        0         0         0       111d
kustomize-controller-69668f89f                                         0         0         0       34d
kustomize-controller-784bd54978                                        0         0         0       290d
kustomize-controller-78d9fc5f5f                                        0         0         0       316d
kustomize-controller-7bbf4fcd8b                                        1         1         1       53d
kustomize-controller-7fc68b65f8                                        0         0         0       61d
notification-controller-55f485bf49                                     0         0         0       111d
notification-controller-56464fd84                                      0         0         0       316d
notification-controller-648bbb9db7                                     0         0         0       290d
notification-controller-6d9d866fb                                      0         0         0       61d
notification-controller-f68ff48f8                                      1         1         1       53d
policy-agent-5d8955b6cb                                                0         0         0       151d
policy-agent-64dc8c99bc                                                1         1         1       20d
policy-agent-745687c5c6                                                0         0         0       82d
policy-agent-75884f59bb                                                0         0         0       117d
policy-agent-86c6cbc77d                                                0         0         0       172d
policy-agent-c8f78fcc4                                                 0         0         0       123d
policy-agent-c98f54fbd                                                 0         0         0       195d
source-controller-58757cc77b                                           0         0         0       34d
source-controller-6cbcbd869f                                           1         1         1       53d
source-controller-7646dcd7c7                                           0         0         0       111d
source-controller-79f7866bc7                                           0         0         0       290d
source-controller-7ddfb8f554                                           0         0         0       316d
source-controller-df76fdcf8                                            0         0         0       61d
templates-controller-controller-manager-6b69c94566                     0         0         0       82d
templates-controller-controller-manager-7875f58d5f                     1         1         1       52d
weave-gitops-enterprise-cluster-controller-5458cbbc48                  0         0         0       298d
weave-gitops-enterprise-cluster-controller-5b4f964cb                   1         1         1       137d
weave-gitops-enterprise-cluster-controller-5d47765864                  0         0         0       158d
weave-gitops-enterprise-cluster-controller-648b88cc8f                  0         0         0       255d
weave-gitops-enterprise-cluster-controller-6d87cd55b6                  0         0         0       217d
weave-gitops-enterprise-cluster-controller-764ccff568                  0         0         0       171d
weave-gitops-enterprise-cluster-controller-77fffd5749                  0         0         0       219d
weave-gitops-enterprise-cluster-controller-79588bfc47                  0         0         0       316d
weave-gitops-enterprise-mccp-cluster-bootstrap-controller-589d666b5d   0         0         0       171d
weave-gitops-enterprise-mccp-cluster-bootstrap-controller-7664f847fb   1         1         1       6d22h
weave-gitops-enterprise-mccp-cluster-bootstrap-controller-768df9cb58   0         0         0       202d
weave-gitops-enterprise-mccp-cluster-bootstrap-controller-7f76c8b68c   0         0         0       228d
weave-gitops-enterprise-mccp-cluster-service-5458fb8585                0         0         0       10d
weave-gitops-enterprise-mccp-cluster-service-55976cf78f                0         0         0       10d
weave-gitops-enterprise-mccp-cluster-service-56d95df775                0         0         0       3d21h
weave-gitops-enterprise-mccp-cluster-service-59db56cb9f                0         0         0       6d20h
weave-gitops-enterprise-mccp-cluster-service-5bcf69bccf                0         0         0       6d22h
weave-gitops-enterprise-mccp-cluster-service-64bb66fcd5                0         0         0       10d
weave-gitops-enterprise-mccp-cluster-service-67dbd9bd95                0         0         0       4d1h
weave-gitops-enterprise-mccp-cluster-service-699d8c85dc                0         0         0       7d1h
weave-gitops-enterprise-mccp-cluster-service-764fc44b4c                1         1         1       3d1h
weave-gitops-enterprise-mccp-cluster-service-8447ddf8f7                0         0         0       10d
weave-gitops-enterprise-mccp-cluster-service-8f498f747                 0         0         0       4d18h
weave-gitops-enterprise-pipeline-controller-557b8795c6                 1         1         1       12d
weave-gitops-enterprise-pipeline-controller-595cc68556                 0         0         0       67d
weave-gitops-enterprise-pipeline-controller-679b55bfc4                 0         0         0       17d
weave-gitops-enterprise-pipeline-controller-7fb7d95747                 0         0         0       59d
weave-gitops-enterprise-pipeline-controller-85bb4db544                 0         0         0       60d
weave-gitops-enterprise-pipeline-controller-c6566c866                  0         0         0       114d
weave-gitops-enterprise-pipeline-controller-d9dc4dbb9                  0         0         0       94d
foot commented 1 year ago

https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#revision-history-limit

By default, 10 old ReplicaSets will be kept

To allow rollbacks and things. Maybe this should go in a "gitops best practices" guide to set this to 0. Anyway, lingering replicasets is fine.

foot commented 1 year ago

Sounds like we will have to hop on a debugging call with PEPSI.

The upgrade sounds to have not gone well. We should check

  1. Do old services and deployments linger as well as pods?
  2. Logs of the helm-controller
  3. Events in flux-system

cc @bigkevmcd anything else come to mind here?

bigkevmcd commented 1 year ago

https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#revision-history-limit

By default, 10 old ReplicaSets will be kept

To allow rollbacks and things. Maybe this should go in a "gitops best practices" guide to set this to 0. Anyway, lingering replicasets is fine.

https://www.weave.works/blog/how-many-kubernetes-replicasets-are-in-your-cluster-

bigkevmcd commented 1 year ago

Do we know how they're upgrading?

Editing the HelmRelease in Git?

foot commented 1 year ago

Do we know how they're upgrading?

Good shout, I don't know.

bigkevmcd commented 1 year ago

Looking at the charts, I notice that we have...

https://github.com/weaveworks/weave-gitops-enterprise/blob/main/charts/mccp/templates/clusters-service/deployment.yaml#L10-L12

and

https://github.com/weaveworks/weave-gitops-enterprise/blob/main/charts/mccp/templates/clusters-service/deployment.yaml#L19-L20

and

https://github.com/weaveworks/weave-gitops-enterprise/blob/main/charts/mccp/templates/_helpers.tpl#L36-L43

We need to make sure that these match, along with https://github.com/weaveworks/weave-gitops-enterprise/blob/main/charts/mccp/templates/clusters-service/service.yaml#L13

foot commented 1 year ago

I remember vaguely that @bigkevmcd bumped into some pod/service selector issue in WGE a while back when debugging logs or something but I can't remember the details.

I think it may have been raised and fixed in https://github.com/weaveworks/weave-gitops-enterprise/pull/1171/files

bigkevmcd commented 1 year ago

I don't see any lingering pods, cleaning the list you can see this...it looks like there's only one running.

❯ kubectl get rs -n flux-system
NAME                                                                   DESIRED   CURRENT   READY   AGE
flux-system-tf-controller-694c8c7bdc                                   1         1         1       111d
gitopssets-controller-manager-5cf4c4f766                               1         1         1       7d1h
helm-controller-7c8bd45cb4                                             1         1         1       53d
kustomize-controller-7bbf4fcd8b                                        1         1         1       53d
notification-controller-f68ff48f8                                      1         1         1       53d
policy-agent-64dc8c99bc                                                1         1         1       20d
source-controller-6cbcbd869f                                           1         1         1       53d
templates-controller-controller-manager-7875f58d5f                     1         1         1       52d
weave-gitops-enterprise-cluster-controller-5b4f964cb                   1         1         1       137d
weave-gitops-enterprise-mccp-cluster-bootstrap-controller-7664f847fb   1         1         1       6d22h
weave-gitops-enterprise-mccp-cluster-service-764fc44b4c                1         1         1       3d1h
weave-gitops-enterprise-pipeline-controller-557b8795c6                 1         1         1       12d
foot commented 1 year ago

I don't see any lingering pods, cleaning the list you can see this...it looks like there's only one running.

Yeah, sorry that output is confusingly from demo-01 just to highlight the replicasets not getting cleaned up, not from PEPSI.