pipe-cd / pipecd

The One CD for All {applications, platforms, operations}
https://pipecd.dev
Apache License 2.0
1.07k stars 150 forks source link

prune cluster scoped resources does not work when specify namespace in app config #4269

Open kevin-namba opened 1 year ago

kevin-namba commented 1 year ago

What happened: prune cluster scoped resources (such as clusterrole) does not work when specify namespace in app config

What you expected to happen:

How to reproduce it:

Environment:

ffjlabo commented 6 months ago

Reproduced on QuickSync like thisโ†“ (name: pod-reader)

[result]

before

% k get clusterrole                                                                          (git)-[delete-cluster-role]
NAME                                                                   CREATED AT
admin                                                                  2024-03-25T07:29:57Z
cluster-admin                                                          2024-03-25T07:29:56Z
edit                                                                   2024-03-25T07:29:57Z
kindnet                                                                2024-03-25T07:30:01Z
kubeadm:get-nodes                                                      2024-03-25T07:29:58Z
local-path-provisioner-role                                            2024-03-25T07:30:02Z
pod-reader                                                             2024-03-28T09:37:08Z
system:aggregate-to-admin                                              2024-03-25T07:29:57Z

after

% k get clusterrole                                                                                 (git)-[delete-cluster-role]
NAME                                                                   CREATED AT
admin                                                                  2024-03-25T07:29:57Z
cluster-admin                                                          2024-03-25T07:29:56Z
edit                                                                   2024-03-25T07:29:57Z
kindnet                                                                2024-03-25T07:30:01Z
kubeadm:get-nodes                                                      2024-03-25T07:29:58Z
local-path-provisioner-role                                            2024-03-25T07:30:02Z
pod-reader                                                             2024-03-28T09:37:08Z
system:aggregate-to-admin                                              2024-03-25T07:29:57Z

Image

[sample]

apiVersion: pipecd.dev/v1beta1
kind: KubernetesApp
spec:
  name: clusterrole-test
  input:
    namespace: test
  labels:
    env: clusterrole-test
    team: product
  quickSync:
    prune: true
  description: |
    This is a test app for clusterrole. Especially for testing the prune on QuickSync.
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: pod-reader
  annotations:
    pipecd.dev/order: "1"
rules:
- apiGroups: [""] # "" indicates the core API group
  resources: ["pods"]
  verbs: ["get", "watch", "list"]
apiVersion: v1
kind: Namespace
metadata:
  name: test
  labels:
    name: test
  annotations:
    pipecd.dev/order: "0"
ffjlabo commented 6 months ago

This might happen because the resource key in the live resource differs from the resource used when pruning.

Image

Image

ffjlabo commented 6 months ago

As this PR https://github.com/pipe-cd/pipecd/pull/3991, all of the resources for the application belong to the namespace set on the spec.input.namespace in app.pipecd.yaml. So in PipeCD, the ClusterRole resource is also treated as the one which belong the namespace.

ffjlabo commented 6 months ago

Root cause:

the resource key stored as livestate and the one on the annotation of actual k8s resource are different values, but actually, they are compared as if they are equal.

The Code:

https://github.com/pipe-cd/pipecd/blob/7b471d7a9f2eb09ccbc0a9222b6e7adb08ea2edf/pkg/app/piped/platformprovider/kubernetes/applier.go#L161-L163

Details

The resource key stored as livestate

More details - applier.Delete(ctx, k) https://github.com/pipe-cd/pipecd/blob/7b471d7a9f2eb09ccbc0a9222b6e7adb08ea2edf/pkg/app/piped/executor/kubernetes/kubernetes.go#L287 - removeKeys are determined by findRemoveKeys using manifests (get from git repo) and liveResources (get from AppLiveResourceLister.ListKubernetesResources) https://github.com/pipe-cd/pipecd/blob/7b471d7a9f2eb09ccbc0a9222b6e7adb08ea2edf/pkg/app/piped/executor/kubernetes/sync.go#L119 - first, determine values from git except for the namespace - then, determine the namespace from the livestate's one - => The namespace of the resourceKey depends on livestate's one - ListKubernetesResources - https://github.com/pipe-cd/pipecd/blob/7b471d7a9f2eb09ccbc0a9222b6e7adb08ea2edf/pkg/app/piped/controller/controller.go#L725-L727 - originally, get from the livestate store - -> ListKubernetesAppLiveResources https://github.com/pipe-cd/pipecd/blob/7b471d7a9f2eb09ccbc0a9222b6e7adb08ea2edf/pkg/app/piped/livestatestore/livestatestore.go#L219-L225 - -> s.store.GetAppLiveManifests https://github.com/pipe-cd/pipecd/blob/7b471d7a9f2eb09ccbc0a9222b6e7adb08ea2edf/pkg/app/piped/livestatestore/kubernetes/store.go#L302-L316 - -> getManagingNodes https://github.com/pipe-cd/pipecd/blob/7b471d7a9f2eb09ccbc0a9222b6e7adb08ea2edf/pkg/app/piped/livestatestore/kubernetes/appnodes.go#L164-L169 - the resourceKey is stored by reflector based on the resource applied in the cluster. - using MakeResourceKey https://github.com/pipe-cd/pipecd/blob/7b471d7a9f2eb09ccbc0a9222b6e7adb08ea2edf/pkg/app/piped/platformprovider/kubernetes/resourcekey.go#L249-L260 - treated the namespace as default if the obj.namespace is empty, ---code reading memo--- - set the handlers (onAdd, onUpdate) https://github.com/pipe-cd/pipecd/blob/7b471d7a9f2eb09ccbc0a9222b6e7adb08ea2edf/pkg/app/piped/livestatestore/kubernetes/kubernetes.go#L102-L103 - start informers with them to detect the changes https://github.com/pipe-cd/pipecd/blob/7b471d7a9f2eb09ccbc0a9222b6e7adb08ea2edf/pkg/app/piped/livestatestore/kubernetes/reflector.go#L219-L224 - -> onObjectAdd, onObjectUpdate - https://github.com/pipe-cd/pipecd/blob/7b471d7a9f2eb09ccbc0a9222b6e7adb08ea2edf/pkg/app/piped/livestatestore/kubernetes/reflector.go#L251-L252 - https://github.com/pipe-cd/pipecd/blob/7b471d7a9f2eb09ccbc0a9222b6e7adb08ea2edf/pkg/app/piped/livestatestore/kubernetes/reflector.go#L282-L286 - -> onAddResource, onUpdateResource - https://github.com/pipe-cd/pipecd/blob/7b471d7a9f2eb09ccbc0a9222b6e7adb08ea2edf/pkg/app/piped/livestatestore/kubernetes/store.go#L170 - https://github.com/pipe-cd/pipecd/blob/7b471d7a9f2eb09ccbc0a9222b6e7adb08ea2edf/pkg/app/piped/livestatestore/kubernetes/store.go#L185-L186 - -> addResource https://github.com/pipe-cd/pipecd/blob/7b471d7a9f2eb09ccbc0a9222b6e7adb08ea2edf/pkg/app/piped/livestatestore/kubernetes/store.go#L156-L157 - -> addManagingResource https://github.com/pipe-cd/pipecd/blob/7b471d7a9f2eb09ccbc0a9222b6e7adb08ea2edf/pkg/app/piped/livestatestore/kubernetes/appnodes.go#L66-L67

The resource key on the annotation of actual k8s resource

More details - the annotation pipecd.dev/resource-key is set based on the resource loaded from the git repo before applying - loadManifests https://github.com/pipe-cd/pipecd/blob/7b471d7a9f2eb09ccbc0a9222b6e7adb08ea2edf/pkg/app/piped/executor/kubernetes/sync.go#L28-L35 - addBuiltinAnnotations https://github.com/pipe-cd/pipecd/blob/7b471d7a9f2eb09ccbc0a9222b6e7adb08ea2edf/pkg/app/piped/executor/kubernetes/kubernetes.go#L223 - When loading manifests, piped executor determines the resourceKey based on the manifest and app.pipecd.yaml - first, determine the resourceKey using MakeResourceKey - -> ParseManifests (all of helm, kustomize, default) https://github.com/pipe-cd/pipecd/blob/7b471d7a9f2eb09ccbc0a9222b6e7adb08ea2edf/pkg/app/piped/platformprovider/kubernetes/manifest.- go#L243-L246 - finally, fix it based on spec.input.namespace on app.pipecd.yaml - https://github.com/pipe-cd/pipecd/blob/7b471d7a9f2eb09ccbc0a9222b6e7adb08ea2edf/pkg/app/piped/platformprovider/kubernetes/loader.go#L90

Example:

Consider the case to apply ClusterRole

ffjlabo commented 6 months ago

Solution candidates

The point is whether to fix or not as if they are equal.

  1. fix not to compare them. (they are not equal)
  2. fix to use of annotation when creating the resourceKey on the livestate. (they are equal)

In both cases, there is no user effect. Considering the effective scope for the fix, 1 is safer. But I think it is confusing that they are not equal.

ffjlabo commented 6 months ago

Other stages

Baseline

load manifests in the same way with K8S_SYNC stage indirectly. (by using loadRunningManifests) https://github.com/pipe-cd/pipecd/blob/7b1103d027fcdbc2468af60deb5da758fcc8f6ae/pkg/app/piped/executor/kubernetes/baseline.go#L46

So the resourceKey is based on the k8s resource manifest, then fixed by app.pipecd.yaml (spec.input.namespace).

then, store it and use it on the BASELINE_CLEAN

ffjlabo commented 6 months ago

๐Ÿ“ the namespace of the resourceKey is spec.input.namespace > original namespace https://github.com/pipe-cd/pipecd/blob/7b1103d027fcdbc2468af60deb5da758fcc8f6ae/pkg/app/piped/platformprovider/kubernetes/loader.go#L90-L91

ffjlabo commented 6 months ago

There are two problems

  1. We can't delete the cluster scoped resources if we set the namespace to the app.pipecd.yaml.
    • This is because the resource key stored as livestate and the one on the annotation of actual k8s resource are different. values.
  2. The resourceKey rule of cluster scoped resources is undefined.
    • Currently, if we set the namespace with app.pipecd.yaml (spec.input.namespace), resourceKey is also affected by it.

So I'll resolve 1 in this issue, and later will try 2.

ffjlabo commented 6 months ago

๐Ÿ“ Prepared the issue for 2 https://github.com/pipe-cd/pipecd/issues/4861

ffjlabo commented 5 months ago

Previous context: https://github.com/pipe-cd/pipecd/issues/4269#issuecomment-2031079364

After some investigations, I decided to fix the logic for the resource key as the ideal state. I think this problem is the incorrect namespace in the resource key created by MakeResourceKey.

https://github.com/pipe-cd/pipecd/blob/7b471d7a9f2eb09ccbc0a9222b6e7adb08ea2edf/pkg/app/piped/platformprovider/kubernetes/resourcekey.go#L249-L260

livestate side

When reading the manifests from git

So for example, cluster-scoped resource don't have any namespace, but if we set the spec.input.namespace in the app.pipecd.yaml, it sets the value as the namespace to the resource key.

Here is the ideal state of the resource key.

livestate side

When reading the manifests from git

So I investigated the scope of the usage of MakeResourceKey.

At first, I drew the call graph.

make-resource-key-before

This shows that the fix affects 4 components.

Next, I categorized each functions by the usecases. The resource key is usually created in the three cases below

  1. read the resource from the cluster (livestate)
  2. read the resource stored as manifests in git repo
  3. create the temporary resource for the progressive delivery (e.g. canary pod and so on) after 2

I think the 1st and 3rd case don't need MakeResourceKey below

So the problem is 2nd case. I think it can be solved by reading the resource from the git repo with the proper namespace.

So, it would be nice to unify the responsibility to the loader.LoadManifests. Especially, add function to the loader like this.

func (l *loader) refineNamespace(m Manifest) string {
    clusterScoped := true // TODO: determine cluster scoped
    if clusterScoped {
        return ""
    }

    if l.input.Namespace != "" {
        return l.input.Namespace
    }

    ns := m.GetNamespace()
    if ns == "" {
        return "default"
    }

    return ns
}

Finally, fixing the manifest's namespace only in the loader.Manifests would be nice. make-resource-key-after

TODO