senthilrch / kube-fledged

A kubernetes operator for creating and managing a cache of container images directly on the cluster worker nodes, so application pods start almost instantly
Apache License 2.0
1.24k stars 118 forks source link

Feature: Add support for custom serviceAccount in Jobs to support k8s environments with PSP restrictions #100

Closed reddymh closed 2 years ago

reddymh commented 3 years ago

When I update the imagecache by removing one of the image in the imagecache CR , jobs will created and will be in dangling state and not deleted and controller pod will go in hanging state.

Logs: Removing the image from the imagecache object or CR:

job will created but no status update after created as well as no response from the controller as well(hanged) and no image will be deleted from the nodes.

I0813 12:08:22.507883 1 controller.go:430] Starting to sync image cache imagecache2(update) I0813 12:08:22.546340 1 controller.go:633] Completed sync actions for image cache imagecache2(update) I0813 12:08:22.552289 1 image_manager.go:430] Job not created (image-already-present:- alpine:3.13.5 -->kube-worker1, runtime: docker://19.3.15) I0813 12:08:22.573488 1 image_manager.go:415] Job imagecache2-z7kss created (delete:- alpine:3.14.1 --> kube-worker1, runtime: docker://19.3.15) I0813 12:08:22.588487 1 image_manager.go:415] Job imagecache2-n8bqx created (delete:- alpine:3.14 --> kube-worker1, runtime: docker://19.3.15) I0813 12:08:22.588677 1 image_manager.go:430] Job not created (image-already-present:- alpine:3.13.5 --> kube-az2-worker2, runtime: docker://19.3.15) I0813 12:08:22.614402 1 image_manager.go:415] Job imagecache2-crtvm created (delete:- alpine:3.14.1 --> kube-az2-worker2, runtime: docker://19.3.15) I0813 12:08:22.627991 1 image_manager.go:415] Job imagecache2-nszgz created (delete:- alpine:3.14 --> kube-az2-worker2, runtime: docker://19.3.15) E0813 12:13:23.364327 1 image_manager.go:212] No pods matched job imagecache2-nszgz E0813 12:13:23.364375 1 image_manager.go:286] Error from updatePendingImageWorkResults(): no pods matched job imagecache2-nszgz

After restarting the controller pod:

I0813 12:30:17.488537 1 controller.go:123] Setting up event handlers I0813 12:30:17.490327 1 main.go:76] Starting pre-flight checks I0813 12:30:17.559958 1 controller.go:170] Dangling Job(imagecache2-75q8k) deleted I0813 12:30:17.591621 1 controller.go:170] Dangling Job(imagecache2-9tl22) deleted I0813 12:30:17.626065 1 controller.go:170] Dangling Job(imagecache2-crtvm) deleted I0813 12:30:17.648474 1 controller.go:170] Dangling Job(imagecache2-g9vt7) deleted I0813 12:30:17.669925 1 controller.go:170] Dangling Job(imagecache2-hgwzb) deleted I0813 12:30:17.691915 1 controller.go:170] Dangling Job(imagecache2-k2btp) deleted I0813 12:30:17.708962 1 controller.go:170] Dangling Job(imagecache2-lrhp4) deleted I0813 12:30:17.723610 1 controller.go:170] Dangling Job(imagecache2-lxwbs) deleted I0813 12:30:17.740635 1 controller.go:170] Dangling Job(imagecache2-n8bqx) deleted I0813 12:30:17.755842 1 controller.go:170] Dangling Job(imagecache2-nszgz) deleted I0813 12:30:17.908447 1 controller.go:170] Dangling Job(imagecache2-q54g2) deleted I0813 12:30:18.108218 1 controller.go:170] Dangling Job(imagecache2-z7kss) deleted I0813 12:30:18.308094 1 controller.go:170] Dangling Job(imagecache2-zq48b) deleted I0813 12:30:18.507513 1 controller.go:170] Dangling Job(imagecache2-zzz84) deleted I0813 12:30:18.567621 1 controller.go:204] Dangling Image cache(imagecache2) status changed to 'Aborted' I0813 12:30:18.567725 1 main.go:80] Pre-flight checks completed

senthilrch commented 3 years ago

@reddymh : thanks for reporting this issue, appreciate it:-

I installed kube-fledged v0.8.1 and performed the same modify operation on imagecache (delete/remove one image). It works fine. See logs below...

eechens@EMB-Q6BUMD6N kube-fledged % make deploy-using-yaml
kubectl apply -f deploy/kubefledged-namespace.yaml
namespace/kube-fledged created
kubectl apply -f deploy/kubefledged-crd.yaml
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
customresourcedefinition.apiextensions.k8s.io/imagecaches.kubefledged.io configured
kubectl apply -f deploy/kubefledged-serviceaccount.yaml
serviceaccount/kubefledged-controller created
serviceaccount/kubefledged-webhook-server created
kubectl apply -f deploy/kubefledged-clusterrole.yaml
clusterrole.rbac.authorization.k8s.io/kubefledged-controller created
clusterrole.rbac.authorization.k8s.io/kubefledged-webhook-server created
kubectl apply -f deploy/kubefledged-clusterrolebinding.yaml
clusterrolebinding.rbac.authorization.k8s.io/kubefledged-controller created
clusterrolebinding.rbac.authorization.k8s.io/kubefledged-webhook-server created
kubectl delete validatingwebhookconfigurations -l app=kubefledged
No resources found
kubectl apply -f deploy/kubefledged-validatingwebhook.yaml
validatingwebhookconfiguration.admissionregistration.k8s.io/kubefledged created
kubectl apply -f deploy/kubefledged-deployment-webhook-server.yaml
deployment.apps/kubefledged-webhook-server created
kubectl apply -f deploy/kubefledged-service-webhook-server.yaml
service/kubefledged-webhook-server created
kubectl apply -f deploy/kubefledged-deployment-controller.yaml
deployment.apps/kubefledged-controller created
kubectl rollout status deployment kubefledged-webhook-server -n kube-fledged --watch
Waiting for deployment "kubefledged-webhook-server" rollout to finish: 0 of 1 updated replicas are available...
deployment "kubefledged-webhook-server" successfully rolled out
kubectl rollout status deployment kubefledged-controller -n kube-fledged --watch
Waiting for deployment "kubefledged-controller" rollout to finish: 0 of 1 updated replicas are available...
deployment "kubefledged-controller" successfully rolled out
eechens@EMB-Q6BUMD6N kube-fledged % kubectl apply -f deploy/kubefledged-imagecache.yaml
imagecache.kubefledged.io/imagecache1 created
eechens@EMB-Q6BUMD6N kube-fledged % kubectl get pods -n kube-fledged
NAME                                         READY   STATUS    RESTARTS   AGE
kubefledged-controller-f95967b7d-rvbzs       1/1     Running   0          48s
kubefledged-webhook-server-9d7f9b55f-4n2ll   1/1     Running   0          50s
eechens@EMB-Q6BUMD6N kube-fledged % kubectl logs -f kubefledged-controller-f95967b7d-rvbzs
I0813 15:27:20.124780       1 controller.go:123] Setting up event handlers
I0813 15:27:20.126120       1 main.go:76] Starting pre-flight checks
I0813 15:27:20.161014       1 controller.go:159] No dangling or stuck jobs found...
I0813 15:27:20.168926       1 controller.go:186] No dangling or stuck imagecaches found...
I0813 15:27:20.168948       1 main.go:80] Pre-flight checks completed
I0813 15:27:20.168963       1 controller.go:224] Starting fledged controller
I0813 15:27:20.168967       1 controller.go:227] Waiting for informer caches to sync
I0813 15:27:20.269900       1 controller.go:232] Starting image cache worker
I0813 15:27:20.269959       1 controller.go:239] Starting cache refresh worker
I0813 15:27:20.269969       1 controller.go:243] Started workers
# Please edit the object below. Lines beginning with a '#' will be ignored,
I0813 15:27:20.269986       1 image_manager.go:341] Starting image manager
I0813 15:27:20.269994       1 image_manager.go:344] Waiting for informer caches to sync
I0813 15:27:20.371038       1 image_manager.go:349] Started image manager
I0813 15:27:34.970178       1 controller.go:430] Starting to sync image cache imagecache1(create)
I0813 15:27:34.998139       1 controller.go:633] Completed sync actions for image cache imagecache1(create)
I0813 15:27:35.025569       1 image_manager.go:428] Job imagecache1-t8l6k created (pull:- quay.io/bitnami/nginx:1.21.1 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0813 15:27:35.025658       1 image_manager.go:430] Job not created (image-already-present:- quay.io/bitnami/tomcat:10.0.8 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0813 15:27:35.033530       1 image_manager.go:428] Job imagecache1-gcc4v created (pull:- quay.io/bitnami/nginx:1.21.1 --> aks-si03c8m32-81246184-vmss000000, runtime: docker://19.3.14)
I0813 15:27:35.033678       1 image_manager.go:430] Job not created (image-already-present:- quay.io/bitnami/tomcat:10.0.8 --> aks-si03c8m32-81246184-vmss000000, runtime: docker://19.3.14)
I0813 15:27:35.041362       1 image_manager.go:428] Job imagecache1-psr5k created (pull:- quay.io/bitnami/redis:6.2.5 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0813 15:27:35.041453       1 image_manager.go:430] Job not created (image-already-present:- quay.io/bitnami/mariadb:10.5.11 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0813 15:27:37.674396       1 image_manager.go:179] Job imagecache1-psr5k succeeded (pull:- quay.io/bitnami/redis:6.2.5 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0813 15:27:38.167128       1 image_manager.go:179] Job imagecache1-gcc4v succeeded (pull:- quay.io/bitnami/nginx:1.21.1 --> aks-si03c8m32-81246184-vmss000000, runtime: docker://19.3.14)
I0813 15:27:38.696000       1 image_manager.go:179] Job imagecache1-t8l6k succeeded (pull:- quay.io/bitnami/nginx:1.21.1 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0813 15:27:39.097323       1 controller.go:430] Starting to sync image cache imagecache1(statusupdate)
I0813 15:27:39.123000       1 controller.go:633] Completed sync actions for image cache imagecache1(statusupdate)
I0813 15:27:39.123050       1 event.go:282] Event(v1.ObjectReference{Kind:"ImageCache", Namespace:"kube-fledged", Name:"imagecache1", UID:"80953422-7060-418f-a6ad-c24b403010b1", APIVersion:"kubefledged.io/v1alpha2", ResourceVersion:"131125771", FieldPath:""}): type: 'Normal' reason: 'ImageCacheCreate' All requested images pulled succesfully to respective nodes
^C
eechens@EMB-Q6BUMD6N kube-fledged % kubens kube-fledged
Context "aks-mxe" modified.
Active namespace is "kube-fledged".
eechens@EMB-Q6BUMD6N kube-fledged % kubectl get jobs
No resources found in kube-fledged namespace.
eechens@EMB-Q6BUMD6N kube-fledged % kubectl get pods
NAME                                         READY   STATUS    RESTARTS   AGE
kubefledged-controller-f95967b7d-rvbzs       1/1     Running   0          94s
kubefledged-webhook-server-9d7f9b55f-4n2ll   1/1     Running   0          96s
eechens@EMB-Q6BUMD6N kube-fledged % kubectl edit ic imagecache1
imagecache.kubefledged.io/imagecache1 edited
eechens@EMB-Q6BUMD6N kube-fledged % kubectl logs -f kubefledged-controller-f95967b7d-rvbzs
I0813 15:27:20.124780       1 controller.go:123] Setting up event handlers
I0813 15:27:20.126120       1 main.go:76] Starting pre-flight checks
I0813 15:27:20.161014       1 controller.go:159] No dangling or stuck jobs found...
I0813 15:27:20.168926       1 controller.go:186] No dangling or stuck imagecaches found...
I0813 15:27:20.168948       1 main.go:80] Pre-flight checks completed
I0813 15:27:20.168963       1 controller.go:224] Starting fledged controller
I0813 15:27:20.168967       1 controller.go:227] Waiting for informer caches to sync
I0813 15:27:20.269900       1 controller.go:232] Starting image cache worker
I0813 15:27:20.269959       1 controller.go:239] Starting cache refresh worker
I0813 15:27:20.269969       1 controller.go:243] Started workers
I0813 15:27:20.269986       1 image_manager.go:341] Starting image manager
I0813 15:27:20.269994       1 image_manager.go:344] Waiting for informer caches to sync
I0813 15:27:20.371038       1 image_manager.go:349] Started image manager
I0813 15:27:34.970178       1 controller.go:430] Starting to sync image cache imagecache1(create)
I0813 15:27:34.998139       1 controller.go:633] Completed sync actions for image cache imagecache1(create)
I0813 15:27:35.025569       1 image_manager.go:428] Job imagecache1-t8l6k created (pull:- quay.io/bitnami/nginx:1.21.1 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0813 15:27:35.025658       1 image_manager.go:430] Job not created (image-already-present:- quay.io/bitnami/tomcat:10.0.8 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0813 15:27:35.033530       1 image_manager.go:428] Job imagecache1-gcc4v created (pull:- quay.io/bitnami/nginx:1.21.1 --> aks-si03c8m32-81246184-vmss000000, runtime: docker://19.3.14)
I0813 15:27:35.033678       1 image_manager.go:430] Job not created (image-already-present:- quay.io/bitnami/tomcat:10.0.8 --> aks-si03c8m32-81246184-vmss000000, runtime: docker://19.3.14)
I0813 15:27:35.041362       1 image_manager.go:428] Job imagecache1-psr5k created (pull:- quay.io/bitnami/redis:6.2.5 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0813 15:27:35.041453       1 image_manager.go:430] Job not created (image-already-present:- quay.io/bitnami/mariadb:10.5.11 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0813 15:27:37.674396       1 image_manager.go:179] Job imagecache1-psr5k succeeded (pull:- quay.io/bitnami/redis:6.2.5 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0813 15:27:38.167128       1 image_manager.go:179] Job imagecache1-gcc4v succeeded (pull:- quay.io/bitnami/nginx:1.21.1 --> aks-si03c8m32-81246184-vmss000000, runtime: docker://19.3.14)
I0813 15:27:38.696000       1 image_manager.go:179] Job imagecache1-t8l6k succeeded (pull:- quay.io/bitnami/nginx:1.21.1 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0813 15:27:39.097323       1 controller.go:430] Starting to sync image cache imagecache1(statusupdate)
I0813 15:27:39.123000       1 controller.go:633] Completed sync actions for image cache imagecache1(statusupdate)
I0813 15:27:39.123050       1 event.go:282] Event(v1.ObjectReference{Kind:"ImageCache", Namespace:"kube-fledged", Name:"imagecache1", UID:"80953422-7060-418f-a6ad-c24b403010b1", APIVersion:"kubefledged.io/v1alpha2", ResourceVersion:"131125771", FieldPath:""}): type: 'Normal' reason: 'ImageCacheCreate' All requested images pulled succesfully to respective nodes
I0813 15:29:08.553617       1 controller.go:430] Starting to sync image cache imagecache1(update)
I0813 15:29:08.586703       1 controller.go:633] Completed sync actions for image cache imagecache1(update)
I0813 15:29:08.604835       1 image_manager.go:428] Job imagecache1-xnhgg created (pull:- quay.io/bitnami/nginx:1.21.1 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0813 15:29:08.614768       1 image_manager.go:415] Job imagecache1-frq87 created (delete:- quay.io/bitnami/tomcat:10.0.8 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0813 15:29:08.626652       1 image_manager.go:428] Job imagecache1-zdxlb created (pull:- quay.io/bitnami/nginx:1.21.1 --> aks-si03c8m32-81246184-vmss000000, runtime: docker://19.3.14)
I0813 15:29:08.634249       1 image_manager.go:415] Job imagecache1-pqhmk created (delete:- quay.io/bitnami/tomcat:10.0.8 --> aks-si03c8m32-81246184-vmss000000, runtime: docker://19.3.14)
I0813 15:29:08.646889       1 image_manager.go:428] Job imagecache1-sh96p created (pull:- quay.io/bitnami/redis:6.2.5 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0813 15:29:08.646992       1 image_manager.go:430] Job not created (image-already-present:- quay.io/bitnami/mariadb:10.5.11 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0813 15:29:10.737359       1 image_manager.go:177] Job imagecache1-pqhmk succeeded (delete:- quay.io/bitnami/tomcat:10.0.8 --> aks-si03c8m32-81246184-vmss000000, runtime: docker://19.3.14)
I0813 15:29:11.276983       1 image_manager.go:179] Job imagecache1-xnhgg succeeded (pull:- quay.io/bitnami/nginx:1.21.1 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0813 15:29:11.294108       1 image_manager.go:177] Job imagecache1-frq87 succeeded (delete:- quay.io/bitnami/tomcat:10.0.8 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0813 15:29:11.795615       1 image_manager.go:179] Job imagecache1-zdxlb succeeded (pull:- quay.io/bitnami/nginx:1.21.1 --> aks-si03c8m32-81246184-vmss000000, runtime: docker://19.3.14)
I0813 15:29:12.336372       1 image_manager.go:179] Job imagecache1-sh96p succeeded (pull:- quay.io/bitnami/redis:6.2.5 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0813 15:29:12.764334       1 controller.go:430] Starting to sync image cache imagecache1(statusupdate)
I0813 15:29:12.790182       1 controller.go:633] Completed sync actions for image cache imagecache1(statusupdate)
I0813 15:29:12.790241       1 event.go:282] Event(v1.ObjectReference{Kind:"ImageCache", Namespace:"kube-fledged", Name:"imagecache1", UID:"80953422-7060-418f-a6ad-c24b403010b1", APIVersion:"kubefledged.io/v1alpha2", ResourceVersion:"131126536", FieldPath:""}): type: 'Normal' reason: 'ImageCacheUpdate' All cached images succesfully deleted from respective nodes
^C
eechens@EMB-Q6BUMD6N kube-fledged % kubectl get nodes
NAME                                STATUS   ROLES   AGE    VERSION
aks-si03c8m32-81246184-vmss000000   Ready    agent   347d   v1.18.14
aks-si03c8m32-81246184-vmss000009   Ready    agent   135d   v1.18.14
eechens@EMB-Q6BUMD6N kube-fledged % kubectl get ic imagecache1 -o yaml
apiVersion: kubefledged.io/v1alpha2
kind: ImageCache
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"kubefledged.io/v1alpha2","kind":"ImageCache","metadata":{"annotations":{},"labels":{"app":"kubefledged","component":"imagecache"},"name":"imagecache1","namespace":"kube-fledged"},"spec":{"cacheSpec":[{"images":["quay.io/bitnami/nginx:1.21.1","quay.io/bitnami/tomcat:10.0.8"]},{"images":["quay.io/bitnami/redis:6.2.5","quay.io/bitnami/mariadb:10.5.11"],"nodeSelector":{"tier":"backend"}}],"imagePullSecrets":[{"name":"myregistrykey"}]}}
  creationTimestamp: "2021-08-13T15:27:34Z"
  generation: 6
  labels:
    app: kubefledged
    component: imagecache
  managedFields:
  - apiVersion: kubefledged.io/v1alpha2
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:kubectl.kubernetes.io/last-applied-configuration: {}
        f:labels:
          .: {}
          f:app: {}
          f:component: {}
      f:spec:
        .: {}
        f:imagePullSecrets: {}
    manager: kubectl-client-side-apply
    operation: Update
    time: "2021-08-13T15:27:34Z"
  - apiVersion: kubefledged.io/v1alpha2
    fieldsType: FieldsV1
    fieldsV1:
      f:spec:
        f:cacheSpec: {}
    manager: kubectl-edit
    operation: Update
    time: "2021-08-13T15:29:08Z"
  - apiVersion: kubefledged.io/v1alpha2
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        .: {}
        f:completionTime: {}
        f:message: {}
        f:reason: {}
        f:startTime: {}
        f:status: {}
    manager: kubefledged-controller
    operation: Update
    time: "2021-08-13T15:29:12Z"
  name: imagecache1
  namespace: kube-fledged
  resourceVersion: "131126650"
  selfLink: /apis/kubefledged.io/v1alpha2/namespaces/kube-fledged/imagecaches/imagecache1
  uid: 80953422-7060-418f-a6ad-c24b403010b1
spec:
  cacheSpec:
  - images:
    - quay.io/bitnami/nginx:1.21.1
  - images:
    - quay.io/bitnami/redis:6.2.5
    - quay.io/bitnami/mariadb:10.5.11
    nodeSelector:
      tier: backend
  imagePullSecrets:
  - name: myregistrykey
status:
  completionTime: "2021-08-13T15:29:12Z"
  message: All cached images succesfully deleted from respective nodes
  reason: ImageCacheUpdate
  startTime: "2021-08-13T15:29:08Z"
  status: Succeeded
eechens@EMB-Q6BUMD6N kube-fledged %

From the logs you pasted, I can make out the image deletion job did not get completed and got expired after the default 5 minutes. The image manager in kube-fledged will fetch the corresponding Pod to fetch the error message and reason for the failure, but the fetch failed. This can happen when the informer cache used in the image manager is not in sync with the persisted state in etcd...

After you restarted the controller, the informer cache would be created anew. Pls. repeat the modify operation again and share the status and logs

reddymh commented 3 years ago

@senthilrch tried again and even re-installed the controlled as well but no luck

Entire Controller Log File:

I0813 17:15:18.805048 1 controller.go:123] Setting up event handlers I0813 17:15:18.808132 1 main.go:76] Starting pre-flight checks I0813 17:15:18.839420 1 controller.go:159] No dangling or stuck jobs found... I0813 17:15:18.904537 1 controller.go:186] No dangling or stuck imagecaches found... I0813 17:15:18.904592 1 main.go:80] Pre-flight checks completed I0813 17:15:18.904624 1 controller.go:224] Starting fledged controller I0813 17:15:18.904635 1 controller.go:227] Waiting for informer caches to sync I0813 17:15:19.005611 1 controller.go:232] Starting image cache worker I0813 17:15:19.005663 1 controller.go:239] Starting cache refresh worker I0813 17:15:19.005672 1 controller.go:243] Started workers I0813 17:15:19.005691 1 image_manager.go:341] Starting image manager I0813 17:15:19.005698 1 image_manager.go:344] Waiting for informer caches to sync I0813 17:15:19.106103 1 image_manager.go:349] Started image manager I0813 17:16:29.495586 1 controller.go:430] Starting to sync image cache imagecache1(create) I0813 17:16:29.580177 1 controller.go:633] Completed sync actions for image cache imagecache1(create) I0813 17:16:29.642794 1 image_manager.go:428] Job imagecache1-mgldl created (pull:- quay.io/bitnami/nginx:1.21.1 --> kube-az1-master1, runtime: docker://19.3.15) I0813 17:16:29.656380 1 image_manager.go:428] Job imagecache1-nckdf created (pull:- quay.io/bitnami/tomcat:10.0.8 --> kube-az1-master1, runtime: docker://19.3.15) I0813 17:16:29.674758 1 image_manager.go:428] Job imagecache1-vxg87 created (pull:- quay.io/bitnami/nginx:1.21.1 --> kube-az2-master1, runtime: docker://19.3.15) I0813 17:16:29.688813 1 image_manager.go:428] Job imagecache1-xd69g created (pull:- quay.io/bitnami/tomcat:10.0.8 --> kube-az2-master1, runtime: docker://19.3.15) I0813 17:16:29.729301 1 image_manager.go:428] Job imagecache1-xmkpb created (pull:- quay.io/bitnami/nginx:1.21.1 --> kube-az1-master2, runtime: docker://19.3.15) I0813 17:16:29.770681 1 image_manager.go:428] Job imagecache1-m4ng4 created (pull:- quay.io/bitnami/tomcat:10.0.8 --> kube-az1-master2, runtime: docker://19.3.15) I0813 17:16:29.795186 1 image_manager.go:428] Job imagecache1-x2n24 created (pull:- quay.io/bitnami/nginx:1.21.1 --> kube-az1-worker1, runtime: docker://19.3.15) I0813 17:16:29.845655 1 image_manager.go:428] Job imagecache1-nssts created (pull:- quay.io/bitnami/tomcat:10.0.8 --> kube-az1-worker1, runtime: docker://19.3.15) I0813 17:16:29.895609 1 image_manager.go:428] Job imagecache1-54dlf created (pull:- quay.io/bitnami/nginx:1.21.1 --> kube-az2-worker2, runtime: docker://19.3.15) I0813 17:16:29.920791 1 image_manager.go:428] Job imagecache1-qqjzc created (pull:- quay.io/bitnami/tomcat:10.0.8 --> kube-az2-worker2, runtime: docker://19.3.15) I0813 17:16:29.941216 1 image_manager.go:428] Job imagecache1-lbtcz created (pull:- quay.io/bitnami/nginx:1.21.1 --> kube-az2-worker1, runtime: docker://19.3.15) I0813 17:16:30.012574 1 image_manager.go:428] Job imagecache1-xhsp4 created (pull:- quay.io/bitnami/tomcat:10.0.8 --> kube-az2-worker1, runtime: docker://19.3.15) I0813 17:16:30.200777 1 image_manager.go:428] Job imagecache1-stn65 created (pull:- quay.io/bitnami/nginx:1.21.1 --> kube-az1-worker2, runtime: docker://19.3.15) I0813 17:16:30.437273 1 image_manager.go:428] Job imagecache1-z6tdq created (pull:- quay.io/bitnami/tomcat:10.0.8 --> kube-az1-worker2, runtime: docker://19.3.15) I0813 17:16:41.351850 1 image_manager.go:179] Job imagecache1-vxg87 succeeded (pull:- quay.io/bitnami/nginx:1.21.1 --> kube-az2-master1, runtime: docker://19.3.15) I0813 17:16:41.528928 1 image_manager.go:179] Job imagecache1-54dlf succeeded (pull:- quay.io/bitnami/nginx:1.21.1 --> kube-az2-worker2, runtime: docker://19.3.15) I0813 17:16:41.631971 1 image_manager.go:179] Job imagecache1-xmkpb succeeded (pull:- quay.io/bitnami/nginx:1.21.1 --> kube-az1-master2, runtime: docker://19.3.15) I0813 17:16:41.673425 1 image_manager.go:179] Job imagecache1-x2n24 succeeded (pull:- quay.io/bitnami/nginx:1.21.1 --> kube-az1-worker1, runtime: docker://19.3.15) I0813 17:16:42.376309 1 image_manager.go:179] Job imagecache1-stn65 succeeded (pull:- quay.io/bitnami/nginx:1.21.1 --> kube-az1-worker2, runtime: docker://19.3.15) I0813 17:16:58.164376 1 image_manager.go:179] Job imagecache1-nckdf succeeded (pull:- quay.io/bitnami/tomcat:10.0.8 --> kube-az1-master1, runtime: docker://19.3.15) I0813 17:17:01.343181 1 image_manager.go:179] Job imagecache1-mgldl succeeded (pull:- quay.io/bitnami/nginx:1.21.1 --> kube-az1-master1, runtime: docker://19.3.15) I0813 17:17:03.024952 1 image_manager.go:179] Job imagecache1-qqjzc succeeded (pull:- quay.io/bitnami/tomcat:10.0.8 --> kube-az2-worker2, runtime: docker://19.3.15) I0813 17:17:04.153229 1 image_manager.go:179] Job imagecache1-m4ng4 succeeded (pull:- quay.io/bitnami/tomcat:10.0.8 --> kube-az1-master2, runtime: docker://19.3.15) I0813 17:17:04.747771 1 image_manager.go:179] Job imagecache1-xd69g succeeded (pull:- quay.io/bitnami/tomcat:10.0.8 --> kube-az1-master1, runtime: docker://19.3.15) I0813 17:17:07.258774 1 image_manager.go:179] Job imagecache1-z6tdq succeeded (pull:- quay.io/bitnami/tomcat:10.0.8 --> kube-az1-worker2, runtime: docker://19.3.15) I0813 17:17:07.441780 1 image_manager.go:179] Job imagecache1-nssts succeeded (pull:- quay.io/bitnami/tomcat:10.0.8 --> kube-az1-worker1, runtime: docker://19.3.15) I0813 17:17:07.522864 1 image_manager.go:179] Job imagecache1-xhsp4 succeeded (pull:- quay.io/bitnami/tomcat:10.0.8 --> kube-az2-worker1, runtime: docker://19.3.15) I0813 17:17:10.648796 1 image_manager.go:179] Job imagecache1-lbtcz succeeded (pull:- quay.io/bitnami/nginx:1.21.1 --> kube-az2-worker1, runtime: docker://19.3.15) I0813 17:17:12.260489 1 controller.go:430] Starting to sync image cache imagecache1(statusupdate) I0813 17:17:12.292936 1 controller.go:633] Completed sync actions for image cache imagecache1(statusupdate) I0813 17:17:12.293071 1 event.go:282] Event(v1.ObjectReference{Kind:"ImageCache", Namespace:"kube-fledged", Name:"imagecache1", UID:"3de5fc51-760d-43b9-8c1f-5054657e8429", APIVersion:"kubefledged.io/v1alpha2", ResourceVersion:"16118630", FieldPath:""}): type: 'Normal' reason: 'ImageCacheCreate' All requested images pulled succesfully to respective nodes I0813 17:17:57.924301 1 controller.go:430] Starting to sync image cache imagecache1(update) I0813 17:17:57.958120 1 controller.go:633] Completed sync actions for image cache imagecache1(update) I0813 17:17:57.963851 1 image_manager.go:430] Job not created (image-already-present:- quay.io/bitnami/nginx:1.21.1 --> kube-az1-worker1, runtime: docker://19.3.15) I0813 17:17:57.982594 1 image_manager.go:415] Job imagecache1-zwb77 created (delete:- quay.io/bitnami/tomcat:10.0.8 --> kube-az1-worker1, runtime: docker://19.3.15) I0813 17:17:57.982717 1 image_manager.go:430] Job not created (image-already-present:- quay.io/bitnami/nginx:1.21.1 --> kube-az2--worker2, runtime: docker://19.3.15) I0813 17:17:57.996390 1 image_manager.go:415] Job imagecache1-b48p6 created (delete:- quay.io/bitnami/tomcat:10.0.8 --> kube-az2-worker2, runtime: docker://19.3.15) I0813 17:17:57.996531 1 image_manager.go:430] Job not created (image-already-present:- quay.io/bitnami/nginx:1.21.1 --> kube-az2-worker1, runtime: docker://19.3.15) I0813 17:17:58.016100 1 image_manager.go:415] Job imagecache1-q4pjk created (delete:- quay.io/bitnami/tomcat:10.0.8 --> kube-az2-worker1, runtime: docker://19.3.15) I0813 17:17:58.016238 1 image_manager.go:430] Job not created (image-already-present:- quay.io/bitnami/nginx:1.21.1 --> kube-az1-worker2, runtime: docker://19.3.15) I0813 17:17:58.033812 1 image_manager.go:415] Job imagecache1-ssqwp created (delete:- quay.io/bitnami/tomcat:10.0.8 --> kube-az1-worker2, runtime: docker://19.3.15) I0813 17:17:58.033884 1 image_manager.go:430] Job not created (image-already-present:- quay.io/bitnami/nginx:1.21.1 --> kube-az1-master1, runtime: docker://19.3.15) I0813 17:17:58.049588 1 image_manager.go:415] Job imagecache1-lgcrt created (delete:- quay.io/bitnami/tomcat:10.0.8 --> kube-az1-master1, runtime: docker://19.3.15) I0813 17:17:58.049688 1 image_manager.go:430] Job not created (image-already-present:- quay.io/bitnami/nginx:1.21.1 --> kube-az2-master1, runtime: docker://19.3.15) I0813 17:17:58.068555 1 image_manager.go:415] Job imagecache1-mv965 created (delete:- quay.io/bitnami/tomcat:10.0.8 --> kube-az2-master1, runtime: docker://19.3.15) I0813 17:17:58.068681 1 image_manager.go:430] Job not created (image-already-present:- quay.io/bitnami/nginx:1.21.1 --> kube-az1-master2, runtime: docker://19.3.15) I0813 17:17:58.080138 1 image_manager.go:415] Job imagecache1-sd8xx created (delete:- quay.io/bitnami/tomcat:10.0.8 --> kube-az1-master2, runtime: docker://19.3.15) E0813 17:22:58.081364 1 image_manager.go:212] No pods matched job imagecache1-lgcrt E0813 17:22:58.084083 1 image_manager.go:286] Error from updatePendingImageWorkResults(): no pods matched job imagecache1-lgcrt

ImageCache CR file:

apiVersion: kubefledged.io/v1alpha2 kind: ImageCache metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"kubefledged.io/v1alpha2","kind":"ImageCache","metadata":{"annotations":{},"labels":{"app":"kubefledged","component":"imagecache"},"name":"imagecache1","namespace":"kube-fledged"},"spec":{"cacheSpec":[{"images":["quay.io/bitnami/nginx:1.21.1"]}],"imagePullSecrets":[{"name":"myregistrykey"}]}} creationTimestamp: "2021-08-13T17:16:29Z" generation: 5 labels: app: kubefledged component: imagecache managedFields:

reddymh commented 3 years ago

@senthilrch I have removed one of the image from the imagecache cr and controller log shows jobs created for delete but I am not able to see the jobs created in the namespace but the same thing I am able to k8s batch jobs for addition(adding new image). Can I get the delete image job(action: delete) manifest so that I can try running k8s batch job and see whether image is getting deleted or not.

senthilrch commented 3 years ago

@reddymh:

apiVersion: batch/v1
kind: Job
metadata:
  creationTimestamp: "2021-08-14T06:18:22Z"
  generateName: imagecache1-
  labels:
    app: imagecache
    controller: fledged
    imagecache: imagecache1
  managedFields:
  - apiVersion: batch/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:generateName: {}
        f:labels:
          .: {}
          f:app: {}
          f:controller: {}
          f:imagecache: {}
        f:ownerReferences:
          .: {}
          k:{"uid":"80953422-7060-418f-a6ad-c24b403010b1"}:
            .: {}
            f:apiVersion: {}
            f:blockOwnerDeletion: {}
            f:controller: {}
            f:kind: {}
            f:name: {}
            f:uid: {}
      f:spec:
        f:activeDeadlineSeconds: {}
        f:backoffLimit: {}
        f:completions: {}
        f:parallelism: {}
        f:template:
          f:metadata:
            f:labels:
              .: {}
              f:app: {}
              f:controller: {}
              f:imagecache: {}
            f:namespace: {}
          f:spec:
            f:containers:
              k:{"name":"docker-cri-client"}:
                .: {}
                f:args: {}
                f:command: {}
                f:image: {}
                f:imagePullPolicy: {}
                f:name: {}
                f:resources: {}
                f:terminationMessagePath: {}
                f:terminationMessagePolicy: {}
                f:volumeMounts:
                  .: {}
                  k:{"mountPath":"/var/run/docker.sock"}:
                    .: {}
                    f:mountPath: {}
                    f:name: {}
            f:dnsPolicy: {}
            f:imagePullSecrets:
              .: {}
              k:{"name":"myregistrykey"}:
                .: {}
                f:name: {}
            f:nodeSelector:
              .: {}
              f:kubernetes.io/hostname: {}
            f:restartPolicy: {}
            f:schedulerName: {}
            f:securityContext: {}
            f:terminationGracePeriodSeconds: {}
            f:tolerations: {}
            f:volumes:
              .: {}
              k:{"name":"runtime-sock"}:
                .: {}
                f:hostPath:
                  .: {}
                  f:path: {}
                  f:type: {}
                f:name: {}
    manager: kubefledged-controller
    operation: Update
    time: "2021-08-14T06:18:22Z"
  - apiVersion: batch/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        f:completionTime: {}
        f:conditions:
          .: {}
          k:{"type":"Complete"}:
            .: {}
            f:lastProbeTime: {}
            f:lastTransitionTime: {}
            f:status: {}
            f:type: {}
        f:startTime: {}
        f:succeeded: {}
    manager: kube-controller-manager
    operation: Update
    time: "2021-08-14T06:18:24Z"
  name: imagecache1-5cg7q
  namespace: kube-fledged
  ownerReferences:
  - apiVersion: kubefledged.io/v1alpha2
    blockOwnerDeletion: true
    controller: true
    kind: ImageCache
    name: imagecache1
    uid: 80953422-7060-418f-a6ad-c24b403010b1
  resourceVersion: "131532506"
  selfLink: /apis/batch/v1/namespaces/kube-fledged/jobs/imagecache1-5cg7q
  uid: 451ab5b9-fa5f-4fdd-b457-55c952d274a8
spec:
  activeDeadlineSeconds: 3600
  backoffLimit: 0
  completions: 1
  parallelism: 1
  selector:
    matchLabels:
      controller-uid: 451ab5b9-fa5f-4fdd-b457-55c952d274a8
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: imagecache
        controller: fledged
        controller-uid: 451ab5b9-fa5f-4fdd-b457-55c952d274a8
        imagecache: imagecache1
        job-name: imagecache1-5cg7q
      namespace: kube-fledged
    spec:
      containers:
      - args:
        - -c
        - exec /usr/bin/docker image rm -f quay.io/non-existent-job22:latest > /dev/termination-log
          2>&1
        command:
        - /bin/bash
        image: senthilrch/kubefledged-cri-client:v0.8.1
        imagePullPolicy: IfNotPresent
        name: docker-cri-client
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/run/docker.sock
          name: runtime-sock
      dnsPolicy: ClusterFirst
      imagePullSecrets:
      - name: myregistrykey
      nodeSelector:
        kubernetes.io/hostname: aks-si03c8m32-81246184-vmss000009
      restartPolicy: Never
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      tolerations:
      - operator: Exists
      volumes:
      - hostPath:
          path: /var/run/docker.sock
          type: Socket
        name: runtime-sock
status:
  completionTime: "2021-08-14T06:18:24Z"
  conditions:
  - lastProbeTime: "2021-08-14T06:18:24Z"
    lastTransitionTime: "2021-08-14T06:18:24Z"
    status: "True"
    type: Complete
  startTime: "2021-08-14T06:18:22Z"
  succeeded: 1
reddymh commented 3 years ago

@senthilrch after debugging the issues found the issue and issue is with psp policies.

Error log from delete batch job:

Error creating: pods "imagecache1-cgws2-" is forbidden: PodSecurityPolicy: unable to admit pod: [spec.containers[0].volumeMounts[0].readOnly: Invalid value: false: must be read-only spec.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used]

Job is binding with any service account so that need to add hostpath policy and bind with service account.

senthilrch commented 3 years ago

@reddymh : thanks for debugging the issue. The image delete pod needs to mount the socket file of the node's container runtime in order to be able to issue the delete command to the container runtime to delete the image on the node.

In kube-fledged we don't set the service account in the Job's pod spec template. So if the default service account of the namespace is linked to a psp policy that doesn't allows hostpath mounts, then deleting the image is not possible.

Any thoughts on how we should adapt kube-fledged so it can work in situations like this? Should we ask users to create a specific service account for kube-fledged's delete job and specify it in ImageCache?

reddymh commented 3 years ago

@senthilrch it will good to have service account and map the psp policy(if enabled) having hostpath "/var/run" enabled as read only or have to use the existing service account "kubefledged-controller".

reddymh commented 3 years ago

@senthilrch I am testing the delete image by adding the serviceaccount , if it works Can I submit the change via PR ?

senthilrch commented 3 years ago

@reddymh, yes go ahead and raise PR. Many thanks for your support and time!

On Sun, 15 Aug 2021, 22:39 Rajshekar Reddy, @.***> wrote:

@senthilrch https://github.com/senthilrch I am testing the delete image by adding the serviceaccount , if it works Can I submit the change via PR ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/senthilrch/kube-fledged/issues/100#issuecomment-899083010, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFZLP4SUUJ6EBTDSLRERAODT47YGLANCNFSM5CDR4OEQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

senthilrch commented 2 years ago

new commandline flag --service-account-name will be added to kubefledged-controller. It's an optional flag. When this flag is present the serviceAccountName specified in commandline will be set inside Job.spec.template.spec.serviceAccountName for every job it creates. This way the image-puller and image-deleter Pods will be created with this serviceAccountName.