kubestellar / kubestellar

KubeStellar - a flexible solution for challenges associated with multi-cluster configuration management for edge, multi-cloud, and hybrid cloud
https://kubestellar.io
Apache License 2.0
264 stars 60 forks source link

bug: IMBS ManifestWork objects are not deleted after the user deletes the associated Placement object from WDS #1596

Closed namasl closed 8 months ago

namasl commented 8 months ago

Describe the bug

Upon deleting the Placement and Deployment objects in the WDS, the ManifestWork objects in the IMBS remain resident, and the WEC Deployments continue to run.

Steps To Reproduce

  1. Complete Scenario 1
  2. Apply the following
    kubectl --context wds1 apply -f - <<EOF
    apiVersion: edge.kubestellar.io/v1alpha1
    kind: Placement
    metadata:
    name: nginx-singleton-placement
    spec:
    wantSingletonReportedState: true
    clusterSelectors:
    - matchLabels: {"location-group":"edge"}
    downsync:
    - objectSelectors:
    - matchLabels: {"app.kubernetes.io/name":"nginx-singleton"}
    ---
    apiVersion: v1
    kind: Namespace
    metadata:
    labels:
    app.kubernetes.io/name: nginx
    name: nginx
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: nginx-singleton-deployment
    namespace: nginx
    labels:
    app.kubernetes.io/name: nginx-singleton
    spec:
    replicas: 1
    selector:
    matchLabels:
      app: nginx
    template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: public.ecr.aws/nginx/nginx:latest 
        ports:
        - containerPort: 80
    EOF
  3. Delete both Placement objects from the WDS
    kubectl --context wds1 delete pl nginx-placement
    kubectl --context wds1 delete pl nginx-singleton-placement
  4. See that the ManifestWork for the singleton still exists
    kubectl --context imbs1 -n cluster1 get manifestwork

Expected Behavior

Upon deleting a Placement object from a WDS, the associated ManifestWork object should automatically be deleted from the IMBS.

Additional Context

No response

pdettori commented 8 months ago

Could you pls. share the log for the kubestellar controller and the yaml for the manifest works, to see if there are errors in the conditions?

namasl commented 8 months ago

Upon tearing down my environment I'm having trouble reproducing... I was in a state where every new workload I created resulted in a persistent ManifestWork. I'll crawl back through my history to see if there was some magic step I looked over to get this to happen.

namasl commented 8 months ago

It was a convoluted path to get here (probably involving me applying a bad manifest along the line), but here is a stuck ManifestWork that still exists while no Locations or Deployments are on the WDS. This is the only sticky ManifestWork at the moment, not quite as broken as I had it earlier. I'll see if I can find a terse way to reproduce something similar.

kubectl --context imbs1 get manifestworks -n cluster1 -o yaml
apiVersion: v1
items:
- apiVersion: work.open-cluster-management.io/v1
  kind: ManifestWork
  metadata:
    creationTimestamp: "2024-01-19T05:35:18Z"
    finalizers:
    - cluster.open-cluster-management.io/manifest-work-cleanup
    generation: 1
    labels:
      managed-by.kubestellar.io/singletonstatus: "true"
    name: appsv1-deployment-nginx-nginx-deployment-sing
    namespace: cluster1
    resourceVersion: "2285"
    uid: 0aab20ca-ad5c-48e4-b051-17dba500ad17
  spec:
    workload:
      manifests:
      - apiVersion: apps/v1
        kind: Deployment
        metadata:
          labels:
            app.kubernetes.io/name: nginx-singleton
          name: nginx-deployment-sing
          namespace: nginx
        spec:
          progressDeadlineSeconds: 600
          replicas: 1
          revisionHistoryLimit: 10
          selector:
            matchLabels:
              app: nginx
          strategy:
            rollingUpdate:
              maxSurge: 25%
              maxUnavailable: 25%
            type: RollingUpdate
          template:
            metadata:
              creationTimestamp: null
              labels:
                app: nginx
            spec:
              containers:
              - image: public.ecr.aws/nginx/nginx:latest
                imagePullPolicy: Always
                name: nginx
                ports:
                - containerPort: 80
                  protocol: TCP
                resources: {}
                terminationMessagePath: /dev/termination-log
                terminationMessagePolicy: File
              dnsPolicy: ClusterFirst
              restartPolicy: Always
              schedulerName: default-scheduler
              securityContext: {}
              terminationGracePeriodSeconds: 30
        status: {}
  status:
    conditions:
    - lastTransitionTime: "2024-01-19T05:39:59Z"
      message: Failed to apply manifest work
      observedGeneration: 1
      reason: AppliedManifestWorkFailed
      status: "False"
      type: Applied
    - lastTransitionTime: "2024-01-19T05:38:59Z"
      message: 1 of 1 resources are not available
      observedGeneration: 1
      reason: ResourcesNotAvailable
      status: "False"
      type: Available
    resourceStatus:
      manifests:
      - conditions:
        - lastTransitionTime: "2024-01-19T05:39:59Z"
          message: 'Failed to apply manifest: namespaces "nginx" not found'
          reason: AppliedManifestFailed
          status: "False"
          type: Applied
        - lastTransitionTime: "2024-01-19T05:38:59Z"
          message: Resource is not available
          reason: ResourceNotAvailable
          status: "False"
          type: Available
        - lastTransitionTime: "2024-01-19T05:35:18Z"
          message: ""
          reason: NoStatusFeedbackSynced
          status: "True"
          type: StatusFeedbackSynced
        resourceMeta:
          group: apps
          kind: Deployment
          name: nginx-deployment-sing
          namespace: nginx
          ordinal: 0
          resource: deployments
          version: v1
        statusFeedback: {}
kind: List
metadata:
  resourceVersion: ""
kubectl --context kind-kubeflex -n kubeflex-system logs kubeflex-controller-manager-586954f574-kgcq6
2024-01-19T04:31:58Z    INFO    controller-runtime.metrics  Metrics server is starting to listen    {"addr": "127.0.0.1:8080"}
2024-01-19T04:31:58Z    INFO    setup   starting manager    {"version": "v0.3.3.3950897"}
2024-01-19T04:31:58Z    INFO    starting server {"path": "/metrics", "kind": "metrics", "addr": "127.0.0.1:8080"}
2024-01-19T04:31:58Z    INFO    Starting server {"kind": "health probe", "addr": "[::]:8081"}
I0119 04:31:58.845644       1 leaderelection.go:250] attempting to acquire leader lease kubeflex-system/c6f71c85.kflex.kubestellar.org...
I0119 04:31:58.875658       1 leaderelection.go:260] successfully acquired lease kubeflex-system/c6f71c85.kflex.kubestellar.org
2024-01-19T04:31:58Z    DEBUG   events  kubeflex-controller-manager-586954f574-kgcq6_a4316389-a56f-4778-9ff1-4a5106f1de4d became leader {"type": "Normal", "object": {"kind":"Lease","namespace":"kubeflex-system","name":"c6f71c85.kflex.kubestellar.org","uid":"456804cb-b565-4b88-9e6f-beff7a06db21","apiVersion":"coordination.k8s.io/v1","resourceVersion":"755"}, "reason": "LeaderElection"}
2024-01-19T04:31:58Z    INFO    Starting EventSource    {"controller": "controlplane", "controllerGroup": "tenancy.kflex.kubestellar.org", "controllerKind": "ControlPlane", "source": "kind source: *v1alpha1.ControlPlane"}
2024-01-19T04:31:58Z    INFO    Starting EventSource    {"controller": "controlplane", "controllerGroup": "tenancy.kflex.kubestellar.org", "controllerKind": "ControlPlane", "source": "kind source: *v1.Service"}
2024-01-19T04:31:58Z    INFO    Starting EventSource    {"controller": "controlplane", "controllerGroup": "tenancy.kflex.kubestellar.org", "controllerKind": "ControlPlane", "source": "kind source: *v1.Ingress"}
2024-01-19T04:31:58Z    INFO    Starting EventSource    {"controller": "controlplane", "controllerGroup": "tenancy.kflex.kubestellar.org", "controllerKind": "ControlPlane", "source": "kind source: *v1.Deployment"}
2024-01-19T04:31:58Z    INFO    Starting EventSource    {"controller": "controlplane", "controllerGroup": "tenancy.kflex.kubestellar.org", "controllerKind": "ControlPlane", "source": "kind source: *v1.StatefulSet"}
2024-01-19T04:31:58Z    INFO    Starting EventSource    {"controller": "controlplane", "controllerGroup": "tenancy.kflex.kubestellar.org", "controllerKind": "ControlPlane", "source": "kind source: *v1.Secret"}
2024-01-19T04:31:58Z    INFO    Starting EventSource    {"controller": "controlplane", "controllerGroup": "tenancy.kflex.kubestellar.org", "controllerKind": "ControlPlane", "source": "kind source: *v1.ConfigMap"}
2024-01-19T04:31:58Z    INFO    Starting EventSource    {"controller": "controlplane", "controllerGroup": "tenancy.kflex.kubestellar.org", "controllerKind": "ControlPlane", "source": "kind source: *v1.ServiceAccount"}
2024-01-19T04:31:58Z    INFO    Starting Controller {"controller": "controlplane", "controllerGroup": "tenancy.kflex.kubestellar.org", "controllerKind": "ControlPlane"}
2024-01-19T04:31:58Z    INFO    Starting workers    {"controller": "controlplane", "controllerGroup": "tenancy.kflex.kubestellar.org", "controllerKind": "ControlPlane", "worker count": 1}
2024-01-19T04:32:08Z    INFO    Got ControlPlane event! {"controller": "controlplane", "controllerGroup": "tenancy.kflex.kubestellar.org", "controllerKind": "ControlPlane", "ControlPlane": {"name":"imbs1"}, "namespace": "", "name": "imbs1", "reconcileID": "7e5937af-44ce-4f35-94a7-b657ad569a02"}
2024/01/19 04:32:15 [debug] creating 9 resource(s)
2024-01-19T04:32:15Z    INFO    Got ControlPlane event! {"controller": "controlplane", "controllerGroup": "tenancy.kflex.kubestellar.org", "controllerKind": "ControlPlane", "ControlPlane": {"name":"imbs1"}, "namespace": "", "name": "imbs1", "reconcileID": "cdf9bc3c-6115-4fcd-9693-70bec1defb50"}
2024-01-19T04:32:15Z    INFO    Got ControlPlane event! {"controller": "controlplane", "controllerGroup": "tenancy.kflex.kubestellar.org", "controllerKind": "ControlPlane", "ControlPlane": {"name":"imbs1"}, "namespace": "", "name": "imbs1", "reconcileID": "882e5d6d-500e-46a0-b568-224c2f0f3cec"}
2024-01-19T04:32:32Z    INFO    Got ControlPlane event! {"controller": "controlplane", "controllerGroup": "tenancy.kflex.kubestellar.org", "controllerKind": "ControlPlane", "ControlPlane": {"name":"imbs1"}, "namespace": "", "name": "imbs1", "reconcileID": "5b591940-4a5c-40d7-acbf-a93c7b3ae471"}
2024-01-19T04:32:32Z    INFO    Got ControlPlane event! {"controller": "controlplane", "controllerGroup": "tenancy.kflex.kubestellar.org", "controllerKind": "ControlPlane", "ControlPlane": {"name":"imbs1"}, "namespace": "", "name": "imbs1", "reconcileID": "20b69758-f7d4-4a6e-a352-560410aee04d"}
2024-01-19T04:32:43Z    INFO    Got ControlPlane event! {"controller": "controlplane", "controllerGroup": "tenancy.kflex.kubestellar.org", "controllerKind": "ControlPlane", "ControlPlane": {"name":"imbs1"}, "namespace": "", "name": "imbs1", "reconcileID": "1a83842d-da00-4d66-aeb3-5f35d9e9fff0"}
2024-01-19T04:32:43Z    INFO    Running ReconcileUpdatePostCreateHook   {"controller": "controlplane", "controllerGroup": "tenancy.kflex.kubestellar.org", "controllerKind": "ControlPlane", "ControlPlane": {"name":"imbs1"}, "namespace": "", "name": "imbs1", "reconcileID": "1a83842d-da00-4d66-aeb3-5f35d9e9fff0", "post-create-hook": "ocm"}
I0119 04:32:44.925237       1 request.go:697] Waited for 1.046743421s due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/scheduling.k8s.io/v1
2024-01-19T04:32:45Z    INFO    Applying    {"controller": "controlplane", "controllerGroup": "tenancy.kflex.kubestellar.org", "controllerKind": "ControlPlane", "ControlPlane": {"name":"imbs1"}, "namespace": "", "name": "imbs1", "reconcileID": "1a83842d-da00-4d66-aeb3-5f35d9e9fff0", "object": "[] job.batch/ocm"}
2024-01-19T04:32:45Z    INFO    Got ControlPlane event! {"controller": "controlplane", "controllerGroup": "tenancy.kflex.kubestellar.org", "controllerKind": "ControlPlane", "ControlPlane": {"name":"imbs1"}, "namespace": "", "name": "imbs1", "reconcileID": "2c45f799-e902-49fd-903f-5c0bdf5c4f62"}
2024-01-19T04:32:45Z    INFO    Got ControlPlane event! {"controller": "controlplane", "controllerGroup": "tenancy.kflex.kubestellar.org", "controllerKind": "ControlPlane", "ControlPlane": {"name":"imbs1"}, "namespace": "", "name": "imbs1", "reconcileID": "4085d81e-2ede-49f7-9c07-5d5e0d82abd1"}
2024-01-19T04:32:45Z    INFO    Got ControlPlane event! {"controller": "controlplane", "controllerGroup": "tenancy.kflex.kubestellar.org", "controllerKind": "ControlPlane", "ControlPlane": {"name":"wds1"}, "namespace": "", "name": "wds1", "reconcileID": "d3d7354d-4bec-40b2-8157-055e88570282"}
2024-01-19T04:32:49Z    INFO    Got ControlPlane event! {"controller": "controlplane", "controllerGroup": "tenancy.kflex.kubestellar.org", "controllerKind": "ControlPlane", "ControlPlane": {"name":"wds1"}, "namespace": "", "name": "wds1", "reconcileID": "5a4b6409-6e71-4a01-8265-05b4724d48a8"}
2024-01-19T04:32:49Z    INFO    Got ControlPlane event! {"controller": "controlplane", "controllerGroup": "tenancy.kflex.kubestellar.org", "controllerKind": "ControlPlane", "ControlPlane": {"name":"wds1"}, "namespace": "", "name": "wds1", "reconcileID": "7050bc79-a197-43a9-82bf-c3eaf86287fe"}
2024-01-19T04:32:49Z    INFO    Got ControlPlane event! {"controller": "controlplane", "controllerGroup": "tenancy.kflex.kubestellar.org", "controllerKind": "ControlPlane", "ControlPlane": {"name":"wds1"}, "namespace": "", "name": "wds1", "reconcileID": "5bca010e-8f84-4998-9ad9-bfe538528494"}
2024-01-19T04:32:49Z    INFO    Got ControlPlane event! {"controller": "controlplane", "controllerGroup": "tenancy.kflex.kubestellar.org", "controllerKind": "ControlPlane", "ControlPlane": {"name":"wds1"}, "namespace": "", "name": "wds1", "reconcileID": "327d5f8a-1a41-457d-9183-dc365ed48bec"}
2024-01-19T04:32:49Z    INFO    Got ControlPlane event! {"controller": "controlplane", "controllerGroup": "tenancy.kflex.kubestellar.org", "controllerKind": "ControlPlane", "ControlPlane": {"name":"wds1"}, "namespace": "", "name": "wds1", "reconcileID": "e6b93996-f6b7-44f3-bf04-c4a301d4101d"}
2024-01-19T04:32:49Z    INFO    Got ControlPlane event! {"controller": "controlplane", "controllerGroup": "tenancy.kflex.kubestellar.org", "controllerKind": "ControlPlane", "ControlPlane": {"name":"wds1"}, "namespace": "", "name": "wds1", "reconcileID": "338f9fef-6409-4fe7-b6c8-84611f538a1b"}
2024-01-19T04:33:09Z    INFO    Got ControlPlane event! {"controller": "controlplane", "controllerGroup": "tenancy.kflex.kubestellar.org", "controllerKind": "ControlPlane", "ControlPlane": {"name":"wds1"}, "namespace": "", "name": "wds1", "reconcileID": "86fcfe49-7550-420b-8596-15fc552b97e3"}
2024-01-19T04:33:09Z    INFO    Got ControlPlane event! {"controller": "controlplane", "controllerGroup": "tenancy.kflex.kubestellar.org", "controllerKind": "ControlPlane", "ControlPlane": {"name":"wds1"}, "namespace": "", "name": "wds1", "reconcileID": "89a42b05-6ea4-4bfe-8bc2-9a44cddce3e6"}
2024-01-19T04:33:19Z    INFO    Got ControlPlane event! {"controller": "controlplane", "controllerGroup": "tenancy.kflex.kubestellar.org", "controllerKind": "ControlPlane", "ControlPlane": {"name":"wds1"}, "namespace": "", "name": "wds1", "reconcileID": "6f452aa5-1f28-45f0-8004-5e38383c1b3a"}
2024-01-19T04:33:19Z    INFO    Got ControlPlane event! {"controller": "controlplane", "controllerGroup": "tenancy.kflex.kubestellar.org", "controllerKind": "ControlPlane", "ControlPlane": {"name":"wds1"}, "namespace": "", "name": "wds1", "reconcileID": "4c573527-8fff-484c-afc9-a00a30eb6c84"}
2024-01-19T04:33:19Z    INFO    Got ControlPlane event! {"controller": "controlplane", "controllerGroup": "tenancy.kflex.kubestellar.org", "controllerKind": "ControlPlane", "ControlPlane": {"name":"wds1"}, "namespace": "", "name": "wds1", "reconcileID": "196c0cc4-caa7-47e6-b646-b234039d9e1d"}
2024-01-19T04:33:32Z    INFO    Got ControlPlane event! {"controller": "controlplane", "controllerGroup": "tenancy.kflex.kubestellar.org", "controllerKind": "ControlPlane", "ControlPlane": {"name":"wds1"}, "namespace": "", "name": "wds1", "reconcileID": "284ca6e5-fead-403e-8a58-fb4c5a6274fc"}
2024-01-19T04:33:32Z    INFO    Got ControlPlane event! {"controller": "controlplane", "controllerGroup": "tenancy.kflex.kubestellar.org", "controllerKind": "ControlPlane", "ControlPlane": {"name":"wds1"}, "namespace": "", "name": "wds1", "reconcileID": "4be25ea6-317d-447f-8522-ac821d4eee5a"}
kubectl --context kind-kubeflex -n wds1-system logs pod/kube-controller-manager-59b556b968-blcbz
I0119 04:32:59.930551       1 serving.go:348] Generated self-signed cert in-memory
I0119 04:33:02.231303       1 controllermanager.go:187] "Starting" version="v1.27.1"
I0119 04:33:02.231320       1 controllermanager.go:189] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
I0119 04:33:02.232142       1 dynamic_cafile_content.go:157] "Starting controller" name="request-header::/etc/kubernetes/pki/front-proxy-ca.crt"
I0119 04:33:02.232237       1 dynamic_cafile_content.go:157] "Starting controller" name="client-ca-bundle::/etc/kubernetes/pki/ca.crt"
I0119 04:33:02.232657       1 secure_serving.go:210] Serving securely on [::]:10257
I0119 04:33:02.232732       1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
I0119 04:33:02.232803       1 leaderelection.go:245] attempting to acquire leader lease kube-system/kube-controller-manager...
E0119 04:33:02.238733       1 leaderelection.go:327] error retrieving resource lock kube-system/kube-controller-manager: Get "https://wds1.wds1-system/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=5s": dial tcp 10.96.158.108:443: connect: connection refused
E0119 04:33:04.550012       1 leaderelection.go:327] error retrieving resource lock kube-system/kube-controller-manager: Get "https://wds1.wds1-system/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=5s": dial tcp 10.96.158.108:443: connect: connection refused
E0119 04:33:08.460805       1 leaderelection.go:327] error retrieving resource lock kube-system/kube-controller-manager: Get "https://wds1.wds1-system/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=5s": dial tcp 10.96.158.108:443: connect: connection refused
E0119 04:33:10.668327       1 leaderelection.go:327] error retrieving resource lock kube-system/kube-controller-manager: Get "https://wds1.wds1-system/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=5s": dial tcp 10.96.158.108:443: connect: connection refused
E0119 04:33:12.769083       1 leaderelection.go:327] error retrieving resource lock kube-system/kube-controller-manager: Get "https://wds1.wds1-system/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=5s": dial tcp 10.96.158.108:443: connect: connection refused
E0119 04:33:14.982473       1 leaderelection.go:327] error retrieving resource lock kube-system/kube-controller-manager: Get "https://wds1.wds1-system/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=5s": dial tcp 10.96.158.108:443: connect: connection refused
E0119 04:33:17.159999       1 leaderelection.go:327] error retrieving resource lock kube-system/kube-controller-manager: Get "https://wds1.wds1-system/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=5s": dial tcp 10.96.158.108:443: connect: connection refused
I0119 04:33:20.212059       1 leaderelection.go:255] successfully acquired lease kube-system/kube-controller-manager
I0119 04:33:20.212197       1 event.go:307] "Event occurred" object="kube-system/kube-controller-manager" fieldPath="" kind="Lease" apiVersion="coordination.k8s.io/v1" type="Normal" reason="LeaderElection" message="kube-controller-manager-59b556b968-blcbz_7095423a-7ab5-4968-9bc5-0a0d66c7beb5 became leader"
I0119 04:33:20.223496       1 shared_informer.go:311] Waiting for caches to sync for tokens
I0119 04:33:20.272038       1 controllermanager.go:638] "Started controller" controller="garbagecollector"
I0119 04:33:20.272076       1 controllermanager.go:603] "Warning: controller is disabled" controller="daemonset"
I0119 04:33:20.272091       1 controllermanager.go:603] "Warning: controller is disabled" controller="horizontalpodautoscaling"
I0119 04:33:20.272103       1 controllermanager.go:603] "Warning: controller is disabled" controller="disruption"
I0119 04:33:20.272113       1 controllermanager.go:603] "Warning: controller is disabled" controller="cronjob"
I0119 04:33:20.272125       1 controllermanager.go:603] "Warning: controller is disabled" controller="nodelifecycle"
I0119 04:33:20.272136       1 controllermanager.go:603] "Warning: controller is disabled" controller="endpointslice"
I0119 04:33:20.272147       1 controllermanager.go:603] "Warning: controller is disabled" controller="endpointslicemirroring"
I0119 04:33:20.272158       1 controllermanager.go:603] "Warning: controller is disabled" controller="persistentvolume-expander"
I0119 04:33:20.272170       1 controllermanager.go:603] "Warning: controller is disabled" controller="resourcequota"
I0119 04:33:20.272181       1 controllermanager.go:603] "Warning: controller is disabled" controller="nodeipam"
I0119 04:33:20.272192       1 controllermanager.go:603] "Warning: controller is disabled" controller="attachdetach"
I0119 04:33:20.272202       1 controllermanager.go:603] "Warning: controller is disabled" controller="ttl-after-finished"
I0119 04:33:20.272213       1 controllermanager.go:603] "Warning: controller is disabled" controller="endpoint"
I0119 04:33:20.272224       1 controllermanager.go:603] "Warning: controller is disabled" controller="podgc"
I0119 04:33:20.272235       1 controllermanager.go:603] "Warning: controller is disabled" controller="pv-protection"
I0119 04:33:20.272523       1 garbagecollector.go:155] "Starting controller" controller="garbagecollector"
I0119 04:33:20.272538       1 shared_informer.go:311] Waiting for caches to sync for garbage collector
I0119 04:33:20.272826       1 graph_builder.go:294] "Running" component="GraphBuilder"
I0119 04:33:20.291166       1 controllermanager.go:638] "Started controller" controller="bootstrapsigner"
I0119 04:33:20.291184       1 controllermanager.go:603] "Warning: controller is disabled" controller="persistentvolume-binder"
I0119 04:33:20.291188       1 controllermanager.go:603] "Warning: controller is disabled" controller="route"
I0119 04:33:20.291191       1 controllermanager.go:603] "Warning: controller is disabled" controller="clusterrole-aggregation"
I0119 04:33:20.291195       1 controllermanager.go:603] "Warning: controller is disabled" controller="pvc-protection"
I0119 04:33:20.291198       1 controllermanager.go:603] "Warning: controller is disabled" controller="deployment"
I0119 04:33:20.291272       1 shared_informer.go:311] Waiting for caches to sync for bootstrap_signer
I0119 04:33:20.320404       1 certificate_controller.go:112] Starting certificate controller "csrsigning-kubelet-serving"
I0119 04:33:20.320418       1 shared_informer.go:311] Waiting for caches to sync for certificate-csrsigning-kubelet-serving
I0119 04:33:20.320442       1 dynamic_serving_content.go:132] "Starting controller" name="csr-controller::/etc/kubernetes/pki/ca.crt::/etc/kubernetes/pki/ca.key"
I0119 04:33:20.321591       1 certificate_controller.go:112] Starting certificate controller "csrsigning-kubelet-client"
I0119 04:33:20.321603       1 shared_informer.go:311] Waiting for caches to sync for certificate-csrsigning-kubelet-client
I0119 04:33:20.321658       1 dynamic_serving_content.go:132] "Starting controller" name="csr-controller::/etc/kubernetes/pki/ca.crt::/etc/kubernetes/pki/ca.key"
I0119 04:33:20.321997       1 certificate_controller.go:112] Starting certificate controller "csrsigning-kube-apiserver-client"
I0119 04:33:20.322002       1 shared_informer.go:311] Waiting for caches to sync for certificate-csrsigning-kube-apiserver-client
I0119 04:33:20.322014       1 dynamic_serving_content.go:132] "Starting controller" name="csr-controller::/etc/kubernetes/pki/ca.crt::/etc/kubernetes/pki/ca.key"
I0119 04:33:20.322165       1 controllermanager.go:638] "Started controller" controller="csrsigning"
I0119 04:33:20.322174       1 controllermanager.go:603] "Warning: controller is disabled" controller="ttl"
I0119 04:33:20.322286       1 certificate_controller.go:112] Starting certificate controller "csrsigning-legacy-unknown"
I0119 04:33:20.322290       1 shared_informer.go:311] Waiting for caches to sync for certificate-csrsigning-legacy-unknown
I0119 04:33:20.322304       1 dynamic_serving_content.go:132] "Starting controller" name="csr-controller::/etc/kubernetes/pki/ca.crt::/etc/kubernetes/pki/ca.key"
I0119 04:33:20.323578       1 shared_informer.go:318] Caches are synced for tokens
I0119 04:33:20.336868       1 controllermanager.go:638] "Started controller" controller="tokencleaner"
I0119 04:33:20.336994       1 tokencleaner.go:112] "Starting token cleaner controller"
I0119 04:33:20.337006       1 shared_informer.go:311] Waiting for caches to sync for token_cleaner
I0119 04:33:20.337012       1 shared_informer.go:318] Caches are synced for token_cleaner
I0119 04:33:20.351411       1 controllermanager.go:638] "Started controller" controller="root-ca-cert-publisher"
I0119 04:33:20.351426       1 controllermanager.go:603] "Warning: controller is disabled" controller="replicaset"
I0119 04:33:20.351430       1 controllermanager.go:603] "Warning: controller is disabled" controller="statefulset"
I0119 04:33:20.351434       1 controllermanager.go:603] "Warning: controller is disabled" controller="ephemeral-volume"
I0119 04:33:20.351439       1 controllermanager.go:603] "Warning: controller is disabled" controller="service"
I0119 04:33:20.351454       1 controllermanager.go:603] "Warning: controller is disabled" controller="cloud-node-lifecycle"
I0119 04:33:20.351459       1 controllermanager.go:603] "Warning: controller is disabled" controller="replicationcontroller"
I0119 04:33:20.351618       1 publisher.go:101] Starting root CA certificate configmap publisher
I0119 04:33:20.351623       1 shared_informer.go:311] Waiting for caches to sync for crt configmap
I0119 04:33:20.359590       1 controllermanager.go:638] "Started controller" controller="csrcleaner"
I0119 04:33:20.359604       1 controllermanager.go:603] "Warning: controller is disabled" controller="job"
I0119 04:33:20.359672       1 cleaner.go:82] Starting CSR cleaner controller
I0119 04:33:20.363561       1 controllermanager.go:638] "Started controller" controller="csrapproving"
I0119 04:33:20.363748       1 certificate_controller.go:112] Starting certificate controller "csrapproving"
I0119 04:33:20.363793       1 shared_informer.go:311] Waiting for caches to sync for certificate-csrapproving
I0119 04:33:20.399052       1 controllermanager.go:638] "Started controller" controller="namespace"
I0119 04:33:20.399146       1 namespace_controller.go:197] "Starting namespace controller"
I0119 04:33:20.399153       1 shared_informer.go:311] Waiting for caches to sync for namespace
I0119 04:33:20.413340       1 controllermanager.go:638] "Started controller" controller="serviceaccount"
I0119 04:33:20.413568       1 serviceaccounts_controller.go:111] "Starting service account controller"
I0119 04:33:20.413578       1 shared_informer.go:311] Waiting for caches to sync for service account
I0119 04:33:20.419353       1 shared_informer.go:311] Waiting for caches to sync for garbage collector
I0119 04:33:20.422467       1 shared_informer.go:318] Caches are synced for certificate-csrsigning-kube-apiserver-client
I0119 04:33:20.422495       1 shared_informer.go:318] Caches are synced for certificate-csrsigning-kubelet-serving
I0119 04:33:20.422514       1 shared_informer.go:318] Caches are synced for certificate-csrsigning-kubelet-client
I0119 04:33:20.425692       1 shared_informer.go:318] Caches are synced for certificate-csrsigning-legacy-unknown
I0119 04:33:20.454576       1 shared_informer.go:318] Caches are synced for crt configmap
I0119 04:33:20.464661       1 shared_informer.go:318] Caches are synced for certificate-csrapproving
I0119 04:33:20.492331       1 shared_informer.go:318] Caches are synced for bootstrap_signer
I0119 04:33:20.499395       1 shared_informer.go:318] Caches are synced for namespace
I0119 04:33:20.514296       1 shared_informer.go:318] Caches are synced for service account
I0119 04:33:21.019979       1 shared_informer.go:318] Caches are synced for garbage collector
I0119 04:33:21.079093       1 shared_informer.go:318] Caches are synced for garbage collector
I0119 04:33:21.079115       1 garbagecollector.go:166] "All resource monitors have synced. Proceeding to collect garbage"
I0119 04:33:51.040447       1 shared_informer.go:311] Waiting for caches to sync for garbage collector
I0119 04:33:51.140768       1 shared_informer.go:318] Caches are synced for garbage collector
pdettori commented 8 months ago

@namasl yes, it would be helpful if you could find a recipe that can replicate the bug. The ManifestWork above complains that there is no namespace "nginx", it might be it was deleted ? or maybe it was never applied from the WDS so the ManifestWork for the deployment stays in pending state?

pdettori commented 8 months ago

Also, for the controller logs, it would help to have the logs for the kubestellar controller manager, the command would look like:

kubectl --context kind-kubeflex -n wds1-system logs kubestellar-controller-manager-<some-id> 
namasl commented 8 months ago
kubectl --context kind-kubeflex -n wds1-system logs kubestellar-controller-manager-78b658dd6-j96bg
2024-01-19T04:33:40Z    INFO    controller-runtime.metrics  Metrics server is starting to listen    {"addr": "127.0.0.1:8080"}
2024-01-19T04:33:40Z    INFO    setup   Getting config for WDS  {"name": "wds1"}
2024-01-19T04:33:40Z    INFO    setup   using label {"key": "kflex.kubestellar.io/cptype", "value": "wds"}
2024-01-19T04:33:40Z    INFO    setup   waiting for cp with label   {"key": "kflex.kubestellar.io/cptype", "value": "wds"}
2024-01-19T04:33:40Z    INFO    setup   Got config for WDS  {"name": "wds1"}
2024-01-19T04:33:40Z    INFO    setup   Getting config for IMBS
2024-01-19T04:33:40Z    INFO    setup   waiting for cp with label   {"key": "kflex.kubestellar.io/cptype", "value": "imbs"}
2024-01-19T04:33:40Z    INFO    setup   Got config for IMBS {"name": "imbs1"}
2024-01-19T04:33:40Z    INFO    applying crd    {"name": "placements.edge.kubestellar.io"}
2024-01-19T04:33:41Z    INFO    crd name accepted   {"name": "placements.edge.kubestellar.io"}
I0119 04:33:42.782476       1 request.go:697] Waited for 1.199568425s due to client-side throttling, not priority and fairness, request: GET:https://wds1.wds1-system.svc.cluster.local/apis/apps/v1/daemonsets?limit=500&resourceVersion=0
2024-01-19T04:33:43Z    INFO    setup   starting manager
2024-01-19T04:33:43Z    INFO    Starting server {"kind": "health probe", "addr": "[::]:8081"}
I0119 04:33:43.499624       1 leaderelection.go:250] attempting to acquire leader lease wds1-system/c6f71c85.kflex.kubestellar.org...
2024-01-19T04:33:43Z    INFO    starting server {"path": "/metrics", "kind": "metrics", "addr": "127.0.0.1:8080"}
I0119 04:33:43.510370       1 leaderelection.go:260] successfully acquired lease wds1-system/c6f71c85.kflex.kubestellar.org
2024-01-19T04:33:43Z    DEBUG   events  kubestellar-controller-manager-78b658dd6-j96bg_80e4920f-3b93-4303-9485-8ce7fc9bf401 became leader   {"type": "Normal", "object": {"kind":"Lease","namespace":"wds1-system","name":"c6f71c85.kflex.kubestellar.org","uid":"7d03c513-cc65-4cd4-bf88-72c2815e1943","apiVersion":"coordination.k8s.io/v1","resourceVersion":"1337"}, "reason": "LeaderElection"}
2024-01-19T04:33:47Z    INFO    All caches synced
2024-01-19T04:33:47Z    INFO    Starting workers    {"count": 4}
2024-01-19T04:33:47Z    INFO    Started workers
Matched key edge.kubestellar.io/v1alpha1/Placement
2024-01-19T04:34:55Z    INFO    Matched {"object": "[] namespace/nginx", "for placement": "nginx-placement"}
2024-01-19T04:34:55Z    INFO    Delivering  {"object": "[] namespace/nginx", "to clusters": ["cluster1", "cluster2"]}
2024-01-19T04:34:55Z    INFO    Matched {"object": "[nginx] deployment.apps/nginx-deployment", "for placement": "nginx-placement"}
2024-01-19T04:34:55Z    INFO    Delivering  {"object": "[nginx] deployment.apps/nginx-deployment", "to clusters": ["cluster1", "cluster2"]}
2024-01-19T04:36:38Z    INFO    Trying to delete manifest   {"manifest name": "v1-namespace--nginx", "namespace": "cluster1", "for placement": "nginx-placement"}
2024-01-19T04:36:38Z    INFO    Trying to delete manifest   {"manifest name": "appsv1-deployment-nginx-nginx-deployment", "namespace": "cluster1", "for placement": "nginx-placement"}
2024-01-19T04:36:38Z    INFO    Trying to delete manifest   {"manifest name": "v1-namespace--nginx", "namespace": "cluster2", "for placement": "nginx-placement"}
2024-01-19T04:36:38Z    INFO    Trying to delete manifest   {"manifest name": "appsv1-deployment-nginx-nginx-deployment", "namespace": "cluster2", "for placement": "nginx-placement"}
Matched key edge.kubestellar.io/v1alpha1/Placement
2024-01-19T04:40:16Z    INFO    Matched {"object": "[nginx] deployment.apps/nginx-deployment-foo", "for placement": "nginx-placement-foo"}
2024-01-19T04:40:16Z    INFO    Delivering  {"object": "[nginx] deployment.apps/nginx-deployment-foo", "to clusters": ["cluster1", "cluster2"]}
2024-01-19T04:46:49Z    INFO    Matched {"object": "[nginx] deployment.apps/nginx-deployment-foo", "for placement": "nginx-placement-foo"}
2024-01-19T04:46:49Z    INFO    Matched {"object": "[nginx] deployment.apps/nginx-deployment", "for placement": "nginx-placement"}
Matched key edge.kubestellar.io/v1alpha1/Placement
2024-01-19T04:46:49Z    INFO    Matched {"object": "[] namespace/nginx", "for placement": "nginx-placement"}
2024-01-19T04:46:49Z    INFO    Delivering  {"object": "[nginx] deployment.apps/nginx-deployment-foo", "to clusters": ["cluster1", "cluster2"]}
2024-01-19T04:46:49Z    INFO    Delivering  {"object": "[nginx] deployment.apps/nginx-deployment", "to clusters": ["cluster1", "cluster2"]}
2024-01-19T04:46:49Z    INFO    Delivering  {"object": "[] namespace/nginx", "to clusters": ["cluster2", "cluster1"]}
2024-01-19T04:48:48Z    INFO    Trying to delete manifest   {"manifest name": "v1-namespace--nginx", "namespace": "cluster2", "for placement": "nginx-placement"}
2024-01-19T04:48:48Z    INFO    Trying to delete manifest   {"manifest name": "appsv1-deployment-nginx-nginx-deployment", "namespace": "cluster2", "for placement": "nginx-placement"}
2024-01-19T04:48:48Z    INFO    Trying to delete manifest   {"manifest name": "v1-namespace--nginx", "namespace": "cluster1", "for placement": "nginx-placement"}
2024-01-19T04:48:48Z    INFO    Trying to delete manifest   {"manifest name": "appsv1-deployment-nginx-nginx-deployment", "namespace": "cluster1", "for placement": "nginx-placement"}
2024-01-19T04:49:50Z    INFO    Matched {"object": "[nginx] deployment.apps/nginx-deployment-foo", "for placement": "nginx-placement-foo"}
2024-01-19T04:49:50Z    INFO    Deleting    {"object": "[nginx] deployment.apps/nginx-deployment-foo", "from clusters": ["cluster1", "cluster2"]}
Matched key edge.kubestellar.io/v1alpha1/Placement
2024-01-19T04:52:10Z    INFO    Matched {"object": "[] namespace/nginx", "for placement": "nginx-placement"}
2024-01-19T04:52:10Z    INFO    Delivering  {"object": "[] namespace/nginx", "to clusters": ["cluster1", "cluster2"]}
2024-01-19T04:52:30Z    INFO    Matched {"object": "[nginx] deployment.apps/nginx-deployment", "for placement": "nginx-placement"}
2024-01-19T04:52:30Z    INFO    Delivering  {"object": "[nginx] deployment.apps/nginx-deployment", "to clusters": ["cluster1", "cluster2"]}
2024-01-19T04:54:16Z    INFO    Matched {"object": "[nginx] deployment.apps/nginx-deployment", "for placement": "nginx-placement"}
Matched key edge.kubestellar.io/v1alpha1/Placement
2024-01-19T04:54:16Z    INFO    Matched {"object": "[] namespace/nginx", "for placement": "nginx-placement"}
2024-01-19T04:54:16Z    INFO    Delivering  {"object": "[] namespace/nginx", "to clusters": ["cluster1"]}
2024-01-19T04:54:16Z    INFO    Delivering  {"object": "[nginx] deployment.apps/nginx-deployment", "to clusters": ["cluster1"]}
2024-01-19T04:56:36Z    INFO    Trying to delete manifest   {"manifest name": "v1-namespace--nginx", "namespace": "cluster2", "for placement": "nginx-placement"}
2024-01-19T04:56:36Z    INFO    Trying to delete manifest   {"manifest name": "appsv1-deployment-nginx-nginx-deployment", "namespace": "cluster2", "for placement": "nginx-placement"}
2024-01-19T04:56:36Z    INFO    Trying to delete manifest   {"manifest name": "v1-namespace--nginx", "namespace": "cluster1", "for placement": "nginx-placement"}
2024-01-19T04:56:36Z    INFO    Trying to delete manifest   {"manifest name": "appsv1-deployment-nginx-nginx-deployment", "namespace": "cluster1", "for placement": "nginx-placement"}
Matched key edge.kubestellar.io/v1alpha1/Placement
2024-01-19T05:13:28Z    INFO    Matched {"object": "[default] deployment.apps/nginx-singleton-deployment", "for placement": "nginx-singleton-placement"}
2024-01-19T05:13:28Z    INFO    Delivering  {"object": "[default] deployment.apps/nginx-singleton-deployment", "to clusters": ["cluster1"]}
2024-01-19T05:14:39Z    INFO    Matched {"object": "[] namespace/nginx", "for placement": "nginx-singleton-placement"}
2024-01-19T05:14:39Z    INFO    Delivering  {"object": "[] namespace/nginx", "to clusters": ["cluster1"]}
2024-01-19T05:14:39Z    INFO    Matched {"object": "[nginx] deployment.apps/nginx-deployment", "for placement": "nginx-singleton-placement"}
2024-01-19T05:14:39Z    INFO    Delivering  {"object": "[nginx] deployment.apps/nginx-deployment", "to clusters": ["cluster1"]}
2024-01-19T05:14:52Z    INFO    Matched {"object": "[nginx] deployment.apps/nginx-deployment-sing", "for placement": "nginx-singleton-placement"}
2024-01-19T05:14:52Z    INFO    Delivering  {"object": "[nginx] deployment.apps/nginx-deployment-sing", "to clusters": ["cluster1"]}
2024-01-19T05:15:54Z    INFO    Trying to delete manifest   {"manifest name": "appsv1-deployment-default-nginx-singleton-deployment", "namespace": "cluster1", "for placement": "nginx-singleton-placement"}
2024-01-19T05:15:54Z    INFO    Trying to delete manifest   {"manifest name": "v1-namespace--nginx", "namespace": "cluster1", "for placement": "nginx-singleton-placement"}
2024-01-19T05:15:54Z    INFO    Trying to delete manifest   {"manifest name": "appsv1-deployment-nginx-nginx-deployment", "namespace": "cluster1", "for placement": "nginx-singleton-placement"}
2024-01-19T05:15:54Z    INFO    Trying to delete manifest   {"manifest name": "appsv1-deployment-nginx-nginx-deployment-sing", "namespace": "cluster1", "for placement": "nginx-singleton-placement"}
Matched key edge.kubestellar.io/v1alpha1/Placement
2024-01-19T05:21:59Z    INFO    Matched {"object": "[] namespace/nginx", "for placement": "nginx-placement"}
2024-01-19T05:21:59Z    INFO    Delivering  {"object": "[] namespace/nginx", "to clusters": ["cluster1", "cluster2"]}
2024-01-19T05:21:59Z    INFO    Matched {"object": "[nginx] deployment.apps/nginx-deployment", "for placement": "nginx-placement"}
2024-01-19T05:21:59Z    INFO    Delivering  {"object": "[nginx] deployment.apps/nginx-deployment", "to clusters": ["cluster1", "cluster2"]}
2024-01-19T05:22:39Z    INFO    Trying to delete manifest   {"manifest name": "v1-namespace--nginx", "namespace": "cluster1", "for placement": "nginx-placement"}
2024-01-19T05:22:39Z    INFO    Trying to delete manifest   {"manifest name": "v1-namespace--nginx", "namespace": "cluster2", "for placement": "nginx-placement"}
2024-01-19T05:22:39Z    INFO    Trying to delete manifest   {"manifest name": "appsv1-deployment-nginx-nginx-deployment", "namespace": "cluster1", "for placement": "nginx-placement"}
2024-01-19T05:22:39Z    INFO    Trying to delete manifest   {"manifest name": "appsv1-deployment-nginx-nginx-deployment", "namespace": "cluster2", "for placement": "nginx-placement"}
Matched key edge.kubestellar.io/v1alpha1/Placement
2024-01-19T05:29:18Z    INFO    Matched {"object": "[] namespace/nginx", "for placement": "nginx-placement"}
2024-01-19T05:29:18Z    INFO    Matched {"object": "[nginx] deployment.apps/nginx-deployment", "for placement": "nginx-placement"}
2024-01-19T05:29:18Z    INFO    Delivering  {"object": "[] namespace/nginx", "to clusters": ["cluster1", "cluster2"]}
2024-01-19T05:29:18Z    INFO    Delivering  {"object": "[nginx] deployment.apps/nginx-deployment", "to clusters": ["cluster1", "cluster2"]}
Matched key edge.kubestellar.io/v1alpha1/Placement
2024-01-19T05:31:44Z    INFO    Matched {"object": "[] namespace/nginx", "for placement": "nginx-placement"}
2024-01-19T05:31:44Z    INFO    Matched {"object": "[nginx] deployment.apps/nginx-deployment", "for placement": "nginx-placement"}
2024-01-19T05:31:44Z    INFO    Delivering  {"object": "[] namespace/nginx", "to clusters": ["cluster1"]}
2024-01-19T05:31:44Z    INFO    Delivering  {"object": "[nginx] deployment.apps/nginx-deployment", "to clusters": ["cluster1"]}
Matched key edge.kubestellar.io/v1alpha1/Placement
2024-01-19T05:34:00Z    INFO    Matched {"object": "[] namespace/nginx", "for placement": "nginx-placement"}
2024-01-19T05:34:00Z    INFO    Matched {"object": "[nginx] deployment.apps/nginx-deployment", "for placement": "nginx-placement"}
2024-01-19T05:34:00Z    INFO    Matched {"object": "[default] deployment.apps/nginx-singleton-deployment", "for placement": "nginx-singleton-placement"}
2024-01-19T05:34:00Z    INFO    Delivering  {"object": "[nginx] deployment.apps/nginx-deployment", "to clusters": ["cluster1"]}
2024-01-19T05:34:00Z    INFO    Delivering  {"object": "[default] deployment.apps/nginx-singleton-deployment", "to clusters": ["cluster1"]}
2024-01-19T05:34:00Z    INFO    Delivering  {"object": "[] namespace/nginx", "to clusters": ["cluster1"]}
2024-01-19T05:34:29Z    INFO    Matched {"object": "[default] deployment.apps/nginx-singleton-deployment", "for placement": "nginx-singleton-placement"}
2024-01-19T05:34:29Z    INFO    Deleting    {"object": "[default] deployment.apps/nginx-singleton-deployment", "from clusters": ["cluster1"]}
2024-01-19T05:35:07Z    INFO    Matched {"object": "[] namespace/nginx", "for placement": "nginx-singleton-placement"}
2024-01-19T05:35:07Z    INFO    Delivering  {"object": "[] namespace/nginx", "to clusters": ["cluster1"]}
2024-01-19T05:35:07Z    INFO    Matched {"object": "[nginx] deployment.apps/nginx-deployment", "for placement": "nginx-singleton-placement"}
2024-01-19T05:35:07Z    INFO    Delivering  {"object": "[nginx] deployment.apps/nginx-deployment", "to clusters": ["cluster1"]}
2024-01-19T05:35:18Z    INFO    Matched {"object": "[nginx] deployment.apps/nginx-deployment-sing", "for placement": "nginx-singleton-placement"}
2024-01-19T05:35:18Z    INFO    Delivering  {"object": "[nginx] deployment.apps/nginx-deployment-sing", "to clusters": ["cluster1"]}
2024-01-19T05:35:38Z    INFO    Trying to delete manifest   {"manifest name": "v1-namespace--nginx", "namespace": "cluster2", "for placement": "nginx-placement"}
2024-01-19T05:35:38Z    INFO    Trying to delete manifest   {"manifest name": "appsv1-deployment-nginx-nginx-deployment", "namespace": "cluster2", "for placement": "nginx-placement"}
2024-01-19T05:35:50Z    INFO    Trying to delete manifest   {"manifest name": "v1-namespace--nginx", "namespace": "cluster1", "for placement": "nginx-singleton-placement"}
2024-01-19T05:35:50Z    INFO    Trying to delete manifest   {"manifest name": "appsv1-deployment-nginx-nginx-deployment", "namespace": "cluster1", "for placement": "nginx-singleton-placement"}
2024-01-19T05:35:50Z    INFO    Trying to delete manifest   {"manifest name": "appsv1-deployment-nginx-nginx-deployment-sing", "namespace": "cluster1", "for placement": "nginx-singleton-placement"}
Matched key edge.kubestellar.io/v1alpha1/Placement
2024-01-19T05:37:21Z    INFO    Matched {"object": "[] namespace/nginx", "for placement": "nginx-placement"}
2024-01-19T05:37:21Z    INFO    Delivering  {"object": "[] namespace/nginx", "to clusters": ["cluster1", "cluster2"]}
2024-01-19T05:37:21Z    INFO    Matched {"object": "[nginx] deployment.apps/nginx-deployment", "for placement": "nginx-placement"}
2024-01-19T05:37:21Z    INFO    Delivering  {"object": "[nginx] deployment.apps/nginx-deployment", "to clusters": ["cluster1", "cluster2"]}
2024-01-19T05:37:33Z    INFO    Matched {"object": "[] namespace/nginx", "for placement": "nginx-placement"}
2024-01-19T05:37:33Z    INFO    Matched {"object": "[nginx] deployment.apps/nginx-deployment", "for placement": "nginx-placement"}
Matched key edge.kubestellar.io/v1alpha1/Placement
2024-01-19T05:37:33Z    INFO    Delivering  {"object": "[] namespace/nginx", "to clusters": ["cluster1", "cluster2"]}
2024-01-19T05:37:33Z    INFO    Delivering  {"object": "[nginx] deployment.apps/nginx-deployment", "to clusters": ["cluster1", "cluster2"]}
2024-01-19T05:37:33Z    INFO    Matched {"object": "[] namespace/nginx", "for placement": "nginx-placement-again"}
2024-01-19T05:37:33Z    INFO    Delivering  {"object": "[] namespace/nginx", "to clusters": ["cluster1", "cluster2"]}
2024-01-19T05:37:33Z    INFO    Matched {"object": "[nginx] deployment.apps/nginx-deployment", "for placement": "nginx-placement-again"}
2024-01-19T05:37:33Z    INFO    Delivering  {"object": "[nginx] deployment.apps/nginx-deployment", "to clusters": ["cluster1", "cluster2"]}
2024-01-19T05:37:54Z    INFO    Trying to delete manifest   {"manifest name": "v1-namespace--nginx", "namespace": "cluster1", "for placement": "nginx-placement-again"}
2024-01-19T05:37:54Z    INFO    Trying to delete manifest   {"manifest name": "v1-namespace--nginx", "namespace": "cluster2", "for placement": "nginx-placement-again"}
2024-01-19T05:37:54Z    INFO    Trying to delete manifest   {"manifest name": "appsv1-deployment-nginx-nginx-deployment", "namespace": "cluster2", "for placement": "nginx-placement-again"}
2024-01-19T05:37:54Z    INFO    Trying to delete manifest   {"manifest name": "appsv1-deployment-nginx-nginx-deployment", "namespace": "cluster1", "for placement": "nginx-placement-again"}
Matched key edge.kubestellar.io/v1alpha1/Placement
2024-01-19T05:38:40Z    INFO    Matched {"object": "[] namespace/nginx", "for placement": "nginx-placement-again"}
2024-01-19T05:38:40Z    INFO    Delivering  {"object": "[] namespace/nginx", "to clusters": ["cluster1", "cluster2"]}
2024-01-19T05:38:53Z    INFO    Matched {"object": "[nginx] deployment.apps/nginx-deployment", "for placement": "nginx-placement-again"}
2024-01-19T05:38:53Z    INFO    Delivering  {"object": "[nginx] deployment.apps/nginx-deployment", "to clusters": ["cluster1", "cluster2"]}
2024-01-19T05:39:50Z    INFO    Trying to delete manifest   {"manifest name": "v1-namespace--nginx", "namespace": "cluster2", "for placement": "nginx-placement-again"}
2024-01-19T05:39:50Z    INFO    Trying to delete manifest   {"manifest name": "v1-namespace--nginx", "namespace": "cluster1", "for placement": "nginx-placement-again"}
2024-01-19T05:39:50Z    INFO    Trying to delete manifest   {"manifest name": "appsv1-deployment-nginx-nginx-deployment", "namespace": "cluster1", "for placement": "nginx-placement-again"}
2024-01-19T05:39:50Z    INFO    Trying to delete manifest   {"manifest name": "appsv1-deployment-nginx-nginx-deployment", "namespace": "cluster2", "for placement": "nginx-placement-again"}
MikeSpreitzer commented 8 months ago

There is also a logic bug in the controller. It does deletions based on being notified of deletions, rather than comparing desired and reported state. Thus, for example, a deletion that happens while the controller is not running is not handled correctly.

namasl commented 8 months ago

I've updated the "Steps to Reproduce" at the top of this issue into a simple procedure which I've been able to reliably get a stuck ManifestWork. After completing said procedure, here is the kubestellar-controller-manager log and ManifestWork yaml:

kubectl --context kind-kubeflex -n wds1-system logs kubestellar-controller-manager-78b658dd6-r4wk6
2024-01-19T20:05:33Z    INFO    controller-runtime.metrics  Metrics server is starting to listen    {"addr": "127.0.0.1:8080"}
2024-01-19T20:05:33Z    INFO    setup   Getting config for WDS  {"name": "wds1"}
2024-01-19T20:05:33Z    INFO    setup   using label {"key": "kflex.kubestellar.io/cptype", "value": "wds"}
2024-01-19T20:05:33Z    INFO    setup   waiting for cp with label   {"key": "kflex.kubestellar.io/cptype", "value": "wds"}
2024-01-19T20:05:33Z    INFO    setup   Got config for WDS  {"name": "wds1"}
2024-01-19T20:05:33Z    INFO    setup   Getting config for IMBS
2024-01-19T20:05:33Z    INFO    setup   waiting for cp with label   {"key": "kflex.kubestellar.io/cptype", "value": "imbs"}
2024-01-19T20:05:33Z    INFO    setup   Got config for IMBS {"name": "imbs1"}
2024-01-19T20:05:33Z    INFO    applying crd    {"name": "placements.edge.kubestellar.io"}
2024-01-19T20:05:33Z    INFO    crd name accepted   {"name": "placements.edge.kubestellar.io"}
I0119 20:05:34.960296       1 request.go:697] Waited for 1.138567399s due to client-side throttling, not priority and fairness, request: GET:https://wds1.wds1-system.svc.cluster.local/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0
2024-01-19T20:05:36Z    INFO    setup   starting manager
2024-01-19T20:05:36Z    INFO    starting server {"path": "/metrics", "kind": "metrics", "addr": "127.0.0.1:8080"}
I0119 20:05:36.750082       1 leaderelection.go:250] attempting to acquire leader lease wds1-system/c6f71c85.kflex.kubestellar.org...
2024-01-19T20:05:36Z    INFO    Starting server {"kind": "health probe", "addr": "[::]:8081"}
I0119 20:05:36.761788       1 leaderelection.go:260] successfully acquired lease wds1-system/c6f71c85.kflex.kubestellar.org
2024-01-19T20:05:36Z    DEBUG   events  kubestellar-controller-manager-78b658dd6-r4wk6_1fc3601c-ab50-4f85-a78f-9128bd1c2915 became leader   {"type": "Normal", "object": {"kind":"Lease","namespace":"wds1-system","name":"c6f71c85.kflex.kubestellar.org","uid":"29e6de3d-c53b-41a5-b6ef-99beb587b76c","apiVersion":"coordination.k8s.io/v1","resourceVersion":"1386"}, "reason": "LeaderElection"}
2024-01-19T20:05:39Z    INFO    All caches synced
2024-01-19T20:05:39Z    INFO    Starting workers    {"count": 4}
2024-01-19T20:05:39Z    INFO    Started workers
E0119 20:05:39.229467       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.229490       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.229502       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.229508       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.229534       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.230600       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.231692       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.232761       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.233830       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.234891       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.235965       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.237035       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.238100       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.239175       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.240244       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.241312       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.242379       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.243554       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.244646       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.247052       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.247116       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.248193       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.249310       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.250462       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.251572       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.252736       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.253747       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.254822       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.255886       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.256948       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.258031       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.259093       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.260153       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.261218       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.262278       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.263345       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.264410       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.265476       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.266536       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.267603       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.268664       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.269738       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.270786       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.271852       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.272921       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.273981       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.275047       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.276109       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.277174       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.278238       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.279302       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.280373       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.281442       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.282520       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.283583       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.284650       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.285711       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.286779       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.287844       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.288912       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.289976       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.291038       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.292106       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.293170       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.294246       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.295309       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.296372       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.297437       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.298491       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.299565       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.300637       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.301694       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.302765       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.303839       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.304897       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.305965       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.307034       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.308106       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.309169       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.310231       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.311302       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.312364       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.313397       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.314461       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.315536       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.316606       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.317669       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.318742       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.321303       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.322374       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.323443       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.324511       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.325579       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.326612       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.327685       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.328748       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.329816       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.330875       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.331939       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.333004       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.334027       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.335090       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.336162       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.337223       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.338294       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.339409       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.340493       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.341585       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.342628       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.343689       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.344762       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.345826       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.346899       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.347956       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.349937       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.351633       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.354632       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.355718       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.356806       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.357896       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.358985       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.360085       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.361161       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.362232       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.363344       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.364421       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.365500       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.366572       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.367647       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.368717       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.369828       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.370960       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.372107       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.373211       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.378379       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.379413       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.380498       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.381642       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.383808       1 controller.go:424] error matching selectors: could not get lister for placememt
E0119 20:05:39.384967       1 controller.go:424] error matching selectors: could not get lister for placememt
2024-01-19T20:05:39Z    INFO    New API added. Starting informer for:   {"group": "edge.kubestellar.io", "version": "edge.kubestellar.io/v1alpha1", "kind": "Placement"}
E0119 20:05:39.385072       1 controller.go:424] error matching selectors: could not get lister for placememt
Matched key edge.kubestellar.io/v1alpha1/Placement
2024-01-19T20:06:53Z    INFO    Matched {"object": "[] namespace/nginx", "for placement": "nginx-placement"}
2024-01-19T20:06:53Z    INFO    Delivering  {"object": "[] namespace/nginx", "to clusters": ["cluster1", "cluster2"]}
2024-01-19T20:06:53Z    INFO    Matched {"object": "[nginx] deployment.apps/nginx-deployment", "for placement": "nginx-placement"}
2024-01-19T20:06:53Z    INFO    Delivering  {"object": "[nginx] deployment.apps/nginx-deployment", "to clusters": ["cluster1", "cluster2"]}
Matched key edge.kubestellar.io/v1alpha1/Placement
2024-01-19T20:11:20Z    INFO    Matched {"object": "[] namespace/nginx", "for placement": "nginx-placement"}
2024-01-19T20:11:20Z    INFO    Matched {"object": "[nginx] deployment.apps/nginx-deployment", "for placement": "nginx-placement"}
2024-01-19T20:11:20Z    INFO    Delivering  {"object": "[] namespace/nginx", "to clusters": ["cluster1", "cluster2"]}
2024-01-19T20:11:20Z    INFO    Delivering  {"object": "[nginx] deployment.apps/nginx-deployment", "to clusters": ["cluster1", "cluster2"]}
2024-01-19T20:11:21Z    INFO    Matched {"object": "[nginx] deployment.apps/nginx-singleton-deployment", "for placement": "nginx-singleton-placement"}
2024-01-19T20:11:21Z    INFO    Delivering  {"object": "[nginx] deployment.apps/nginx-singleton-deployment", "to clusters": ["cluster1"]}
2024-01-19T20:12:13Z    INFO    Trying to delete manifest   {"manifest name": "v1-namespace--nginx", "namespace": "cluster2", "for placement": "nginx-placement"}
2024-01-19T20:12:13Z    INFO    Trying to delete manifest   {"manifest name": "appsv1-deployment-nginx-nginx-deployment", "namespace": "cluster2", "for placement": "nginx-placement"}
2024-01-19T20:12:13Z    INFO    Trying to delete manifest   {"manifest name": "v1-namespace--nginx", "namespace": "cluster1", "for placement": "nginx-placement"}
2024-01-19T20:12:13Z    INFO    Trying to delete manifest   {"manifest name": "appsv1-deployment-nginx-nginx-deployment", "namespace": "cluster1", "for placement": "nginx-placement"}
2024-01-19T20:12:21Z    INFO    Trying to delete manifest   {"manifest name": "appsv1-deployment-nginx-nginx-singleton-deployment", "namespace": "cluster1", "for placement": "nginx-singleton-placement"}
kubectl --context imbs1 -n cluster1 get manifestwork appsv1-deployment-nginx-nginx-singleton-deployment -o yaml
apiVersion: work.open-cluster-management.io/v1
kind: ManifestWork
metadata:
  creationTimestamp: "2024-01-19T20:11:21Z"
  finalizers:
  - cluster.open-cluster-management.io/manifest-work-cleanup
  generation: 1
  labels:
    managed-by.kubestellar.io/singletonstatus: "true"
  name: appsv1-deployment-nginx-nginx-singleton-deployment
  namespace: cluster1
  resourceVersion: "876"
  uid: 1dfdbc0f-3891-492f-9792-669cf2f8aa85
spec:
  workload:
    manifests:
    - apiVersion: apps/v1
      kind: Deployment
      metadata:
        labels:
          app.kubernetes.io/name: nginx-singleton
        name: nginx-singleton-deployment
        namespace: nginx
      spec:
        progressDeadlineSeconds: 600
        replicas: 1
        revisionHistoryLimit: 10
        selector:
          matchLabels:
            app: nginx
        strategy:
          rollingUpdate:
            maxSurge: 25%
            maxUnavailable: 25%
          type: RollingUpdate
        template:
          metadata:
            creationTimestamp: null
            labels:
              app: nginx
          spec:
            containers:
            - image: public.ecr.aws/nginx/nginx:latest
              imagePullPolicy: Always
              name: nginx
              ports:
              - containerPort: 80
                protocol: TCP
              resources: {}
              terminationMessagePath: /dev/termination-log
              terminationMessagePolicy: File
            dnsPolicy: ClusterFirst
            restartPolicy: Always
            schedulerName: default-scheduler
            securityContext: {}
            terminationGracePeriodSeconds: 30
      status: {}
status:
  conditions:
  - lastTransitionTime: "2024-01-19T20:12:21Z"
    message: Failed to apply manifest work
    observedGeneration: 1
    reason: AppliedManifestWorkFailed
    status: "False"
    type: Applied
  - lastTransitionTime: "2024-01-19T20:12:21Z"
    message: 1 of 1 resources are not available
    observedGeneration: 1
    reason: ResourcesNotAvailable
    status: "False"
    type: Available
  resourceStatus:
    manifests:
    - conditions:
      - lastTransitionTime: "2024-01-19T20:12:21Z"
        message: 'Failed to apply manifest: namespaces "nginx" not found'
        reason: AppliedManifestFailed
        status: "False"
        type: Applied
      - lastTransitionTime: "2024-01-19T20:12:21Z"
        message: Resource is not available
        reason: ResourceNotAvailable
        status: "False"
        type: Available
      - lastTransitionTime: "2024-01-19T20:11:21Z"
        message: ""
        reason: NoStatusFeedbackSynced
        status: "True"
        type: StatusFeedbackSynced
      resourceMeta:
        group: apps
        kind: Deployment
        name: nginx-singleton-deployment
        namespace: nginx
        ordinal: 0
        resource: deployments
        version: v1
      statusFeedback: {}
pdettori commented 8 months ago

@namasl when you get in that state, what happens if you create manually the namespace nginx in cluster1 ? It seems like the manifest work gets stuck trying to apply the deployment to a namespace that no longer exists because it was deleted first.

pdettori commented 8 months ago

@MikeSpreitzer - I agree, it should work by reconciling desired and current state. Currently this typically is triggered when a placement is deleted before removing the finalizer. Apparently the stuck manifest work is not removed because it also has a finalizer that likely is not removed by the work agent because it's stuck in that state. We don't have periodic re-queueing to trigger the reconciliation logic though, this happens usually whenever there is a change in a placement object.

pdettori commented 8 months ago

@namasl I confirm I am able to reproduce the issue with your updated steps.

pdettori commented 8 months ago

I found what the issue is, it should be simple to fix. The current logic checks if a manifest belongs to other placements before removing it. This is done looking at the labels. This function: https://github.com/kubestellar/kubestellar/blob/6b11822e04258c8eff474510c70708f93c5cadf3/pkg/placement/placement.go#L251-L261

looks at the labels based on the assumptions that each placement inserts a label that starts with the prefix managed-by.kubestellar.io . But I have used the same prefix to signal to the status controller that a manifest work requires singleton status: managed-by.kubestellar.io/singletonstatus: "true"

The manifest work you cannot delete has these labels before deleting both placements:

$ k --context imbs1 -n cluster1 get manifestwork appsv1-deployment-nginx-nginx-singleton-deployment -o yaml
apiVersion: work.open-cluster-management.io/v1
kind: ManifestWork
metadata:
  creationTimestamp: "2024-01-21T23:50:52Z"
  finalizers:
  - cluster.open-cluster-management.io/manifest-work-cleanup
  generation: 3
  labels:
    managed-by.kubestellar.io/singletonstatus: "true"
    managed-by.kubestellar.io/wds1.nginx-singleton-placement: "true"
  name: appsv1-deployment-nginx-nginx-singleton-deployment
  namespace: cluster1

So the current logic finds the label managed-by.kubestellar.io/singletonstatus: "true" and only removes the label and not the placement:

https://github.com/kubestellar/kubestellar/blob/6b11822e04258c8eff474510c70708f93c5cadf3/pkg/placement/placement.go#L229-L249

The simplest solution is to replace the label managed-by.kubestellar.io/singletonstatus: "true" with something else that does not create the ambiguity.

@namasl let me know if you are interested on working on a PR for this.