Closed reddymh closed 2 years ago
@reddymh : thanks for reporting this issue, appreciate it:-
I installed kube-fledged v0.8.1 and performed the same modify operation on imagecache (delete/remove one image). It works fine. See logs below...
eechens@EMB-Q6BUMD6N kube-fledged % make deploy-using-yaml
kubectl apply -f deploy/kubefledged-namespace.yaml
namespace/kube-fledged created
kubectl apply -f deploy/kubefledged-crd.yaml
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
customresourcedefinition.apiextensions.k8s.io/imagecaches.kubefledged.io configured
kubectl apply -f deploy/kubefledged-serviceaccount.yaml
serviceaccount/kubefledged-controller created
serviceaccount/kubefledged-webhook-server created
kubectl apply -f deploy/kubefledged-clusterrole.yaml
clusterrole.rbac.authorization.k8s.io/kubefledged-controller created
clusterrole.rbac.authorization.k8s.io/kubefledged-webhook-server created
kubectl apply -f deploy/kubefledged-clusterrolebinding.yaml
clusterrolebinding.rbac.authorization.k8s.io/kubefledged-controller created
clusterrolebinding.rbac.authorization.k8s.io/kubefledged-webhook-server created
kubectl delete validatingwebhookconfigurations -l app=kubefledged
No resources found
kubectl apply -f deploy/kubefledged-validatingwebhook.yaml
validatingwebhookconfiguration.admissionregistration.k8s.io/kubefledged created
kubectl apply -f deploy/kubefledged-deployment-webhook-server.yaml
deployment.apps/kubefledged-webhook-server created
kubectl apply -f deploy/kubefledged-service-webhook-server.yaml
service/kubefledged-webhook-server created
kubectl apply -f deploy/kubefledged-deployment-controller.yaml
deployment.apps/kubefledged-controller created
kubectl rollout status deployment kubefledged-webhook-server -n kube-fledged --watch
Waiting for deployment "kubefledged-webhook-server" rollout to finish: 0 of 1 updated replicas are available...
deployment "kubefledged-webhook-server" successfully rolled out
kubectl rollout status deployment kubefledged-controller -n kube-fledged --watch
Waiting for deployment "kubefledged-controller" rollout to finish: 0 of 1 updated replicas are available...
deployment "kubefledged-controller" successfully rolled out
eechens@EMB-Q6BUMD6N kube-fledged % kubectl apply -f deploy/kubefledged-imagecache.yaml
imagecache.kubefledged.io/imagecache1 created
eechens@EMB-Q6BUMD6N kube-fledged % kubectl get pods -n kube-fledged
NAME READY STATUS RESTARTS AGE
kubefledged-controller-f95967b7d-rvbzs 1/1 Running 0 48s
kubefledged-webhook-server-9d7f9b55f-4n2ll 1/1 Running 0 50s
eechens@EMB-Q6BUMD6N kube-fledged % kubectl logs -f kubefledged-controller-f95967b7d-rvbzs
I0813 15:27:20.124780 1 controller.go:123] Setting up event handlers
I0813 15:27:20.126120 1 main.go:76] Starting pre-flight checks
I0813 15:27:20.161014 1 controller.go:159] No dangling or stuck jobs found...
I0813 15:27:20.168926 1 controller.go:186] No dangling or stuck imagecaches found...
I0813 15:27:20.168948 1 main.go:80] Pre-flight checks completed
I0813 15:27:20.168963 1 controller.go:224] Starting fledged controller
I0813 15:27:20.168967 1 controller.go:227] Waiting for informer caches to sync
I0813 15:27:20.269900 1 controller.go:232] Starting image cache worker
I0813 15:27:20.269959 1 controller.go:239] Starting cache refresh worker
I0813 15:27:20.269969 1 controller.go:243] Started workers
# Please edit the object below. Lines beginning with a '#' will be ignored,
I0813 15:27:20.269986 1 image_manager.go:341] Starting image manager
I0813 15:27:20.269994 1 image_manager.go:344] Waiting for informer caches to sync
I0813 15:27:20.371038 1 image_manager.go:349] Started image manager
I0813 15:27:34.970178 1 controller.go:430] Starting to sync image cache imagecache1(create)
I0813 15:27:34.998139 1 controller.go:633] Completed sync actions for image cache imagecache1(create)
I0813 15:27:35.025569 1 image_manager.go:428] Job imagecache1-t8l6k created (pull:- quay.io/bitnami/nginx:1.21.1 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0813 15:27:35.025658 1 image_manager.go:430] Job not created (image-already-present:- quay.io/bitnami/tomcat:10.0.8 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0813 15:27:35.033530 1 image_manager.go:428] Job imagecache1-gcc4v created (pull:- quay.io/bitnami/nginx:1.21.1 --> aks-si03c8m32-81246184-vmss000000, runtime: docker://19.3.14)
I0813 15:27:35.033678 1 image_manager.go:430] Job not created (image-already-present:- quay.io/bitnami/tomcat:10.0.8 --> aks-si03c8m32-81246184-vmss000000, runtime: docker://19.3.14)
I0813 15:27:35.041362 1 image_manager.go:428] Job imagecache1-psr5k created (pull:- quay.io/bitnami/redis:6.2.5 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0813 15:27:35.041453 1 image_manager.go:430] Job not created (image-already-present:- quay.io/bitnami/mariadb:10.5.11 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0813 15:27:37.674396 1 image_manager.go:179] Job imagecache1-psr5k succeeded (pull:- quay.io/bitnami/redis:6.2.5 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0813 15:27:38.167128 1 image_manager.go:179] Job imagecache1-gcc4v succeeded (pull:- quay.io/bitnami/nginx:1.21.1 --> aks-si03c8m32-81246184-vmss000000, runtime: docker://19.3.14)
I0813 15:27:38.696000 1 image_manager.go:179] Job imagecache1-t8l6k succeeded (pull:- quay.io/bitnami/nginx:1.21.1 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0813 15:27:39.097323 1 controller.go:430] Starting to sync image cache imagecache1(statusupdate)
I0813 15:27:39.123000 1 controller.go:633] Completed sync actions for image cache imagecache1(statusupdate)
I0813 15:27:39.123050 1 event.go:282] Event(v1.ObjectReference{Kind:"ImageCache", Namespace:"kube-fledged", Name:"imagecache1", UID:"80953422-7060-418f-a6ad-c24b403010b1", APIVersion:"kubefledged.io/v1alpha2", ResourceVersion:"131125771", FieldPath:""}): type: 'Normal' reason: 'ImageCacheCreate' All requested images pulled succesfully to respective nodes
^C
eechens@EMB-Q6BUMD6N kube-fledged % kubens kube-fledged
Context "aks-mxe" modified.
Active namespace is "kube-fledged".
eechens@EMB-Q6BUMD6N kube-fledged % kubectl get jobs
No resources found in kube-fledged namespace.
eechens@EMB-Q6BUMD6N kube-fledged % kubectl get pods
NAME READY STATUS RESTARTS AGE
kubefledged-controller-f95967b7d-rvbzs 1/1 Running 0 94s
kubefledged-webhook-server-9d7f9b55f-4n2ll 1/1 Running 0 96s
eechens@EMB-Q6BUMD6N kube-fledged % kubectl edit ic imagecache1
imagecache.kubefledged.io/imagecache1 edited
eechens@EMB-Q6BUMD6N kube-fledged % kubectl logs -f kubefledged-controller-f95967b7d-rvbzs
I0813 15:27:20.124780 1 controller.go:123] Setting up event handlers
I0813 15:27:20.126120 1 main.go:76] Starting pre-flight checks
I0813 15:27:20.161014 1 controller.go:159] No dangling or stuck jobs found...
I0813 15:27:20.168926 1 controller.go:186] No dangling or stuck imagecaches found...
I0813 15:27:20.168948 1 main.go:80] Pre-flight checks completed
I0813 15:27:20.168963 1 controller.go:224] Starting fledged controller
I0813 15:27:20.168967 1 controller.go:227] Waiting for informer caches to sync
I0813 15:27:20.269900 1 controller.go:232] Starting image cache worker
I0813 15:27:20.269959 1 controller.go:239] Starting cache refresh worker
I0813 15:27:20.269969 1 controller.go:243] Started workers
I0813 15:27:20.269986 1 image_manager.go:341] Starting image manager
I0813 15:27:20.269994 1 image_manager.go:344] Waiting for informer caches to sync
I0813 15:27:20.371038 1 image_manager.go:349] Started image manager
I0813 15:27:34.970178 1 controller.go:430] Starting to sync image cache imagecache1(create)
I0813 15:27:34.998139 1 controller.go:633] Completed sync actions for image cache imagecache1(create)
I0813 15:27:35.025569 1 image_manager.go:428] Job imagecache1-t8l6k created (pull:- quay.io/bitnami/nginx:1.21.1 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0813 15:27:35.025658 1 image_manager.go:430] Job not created (image-already-present:- quay.io/bitnami/tomcat:10.0.8 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0813 15:27:35.033530 1 image_manager.go:428] Job imagecache1-gcc4v created (pull:- quay.io/bitnami/nginx:1.21.1 --> aks-si03c8m32-81246184-vmss000000, runtime: docker://19.3.14)
I0813 15:27:35.033678 1 image_manager.go:430] Job not created (image-already-present:- quay.io/bitnami/tomcat:10.0.8 --> aks-si03c8m32-81246184-vmss000000, runtime: docker://19.3.14)
I0813 15:27:35.041362 1 image_manager.go:428] Job imagecache1-psr5k created (pull:- quay.io/bitnami/redis:6.2.5 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0813 15:27:35.041453 1 image_manager.go:430] Job not created (image-already-present:- quay.io/bitnami/mariadb:10.5.11 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0813 15:27:37.674396 1 image_manager.go:179] Job imagecache1-psr5k succeeded (pull:- quay.io/bitnami/redis:6.2.5 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0813 15:27:38.167128 1 image_manager.go:179] Job imagecache1-gcc4v succeeded (pull:- quay.io/bitnami/nginx:1.21.1 --> aks-si03c8m32-81246184-vmss000000, runtime: docker://19.3.14)
I0813 15:27:38.696000 1 image_manager.go:179] Job imagecache1-t8l6k succeeded (pull:- quay.io/bitnami/nginx:1.21.1 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0813 15:27:39.097323 1 controller.go:430] Starting to sync image cache imagecache1(statusupdate)
I0813 15:27:39.123000 1 controller.go:633] Completed sync actions for image cache imagecache1(statusupdate)
I0813 15:27:39.123050 1 event.go:282] Event(v1.ObjectReference{Kind:"ImageCache", Namespace:"kube-fledged", Name:"imagecache1", UID:"80953422-7060-418f-a6ad-c24b403010b1", APIVersion:"kubefledged.io/v1alpha2", ResourceVersion:"131125771", FieldPath:""}): type: 'Normal' reason: 'ImageCacheCreate' All requested images pulled succesfully to respective nodes
I0813 15:29:08.553617 1 controller.go:430] Starting to sync image cache imagecache1(update)
I0813 15:29:08.586703 1 controller.go:633] Completed sync actions for image cache imagecache1(update)
I0813 15:29:08.604835 1 image_manager.go:428] Job imagecache1-xnhgg created (pull:- quay.io/bitnami/nginx:1.21.1 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0813 15:29:08.614768 1 image_manager.go:415] Job imagecache1-frq87 created (delete:- quay.io/bitnami/tomcat:10.0.8 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0813 15:29:08.626652 1 image_manager.go:428] Job imagecache1-zdxlb created (pull:- quay.io/bitnami/nginx:1.21.1 --> aks-si03c8m32-81246184-vmss000000, runtime: docker://19.3.14)
I0813 15:29:08.634249 1 image_manager.go:415] Job imagecache1-pqhmk created (delete:- quay.io/bitnami/tomcat:10.0.8 --> aks-si03c8m32-81246184-vmss000000, runtime: docker://19.3.14)
I0813 15:29:08.646889 1 image_manager.go:428] Job imagecache1-sh96p created (pull:- quay.io/bitnami/redis:6.2.5 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0813 15:29:08.646992 1 image_manager.go:430] Job not created (image-already-present:- quay.io/bitnami/mariadb:10.5.11 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0813 15:29:10.737359 1 image_manager.go:177] Job imagecache1-pqhmk succeeded (delete:- quay.io/bitnami/tomcat:10.0.8 --> aks-si03c8m32-81246184-vmss000000, runtime: docker://19.3.14)
I0813 15:29:11.276983 1 image_manager.go:179] Job imagecache1-xnhgg succeeded (pull:- quay.io/bitnami/nginx:1.21.1 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0813 15:29:11.294108 1 image_manager.go:177] Job imagecache1-frq87 succeeded (delete:- quay.io/bitnami/tomcat:10.0.8 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0813 15:29:11.795615 1 image_manager.go:179] Job imagecache1-zdxlb succeeded (pull:- quay.io/bitnami/nginx:1.21.1 --> aks-si03c8m32-81246184-vmss000000, runtime: docker://19.3.14)
I0813 15:29:12.336372 1 image_manager.go:179] Job imagecache1-sh96p succeeded (pull:- quay.io/bitnami/redis:6.2.5 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0813 15:29:12.764334 1 controller.go:430] Starting to sync image cache imagecache1(statusupdate)
I0813 15:29:12.790182 1 controller.go:633] Completed sync actions for image cache imagecache1(statusupdate)
I0813 15:29:12.790241 1 event.go:282] Event(v1.ObjectReference{Kind:"ImageCache", Namespace:"kube-fledged", Name:"imagecache1", UID:"80953422-7060-418f-a6ad-c24b403010b1", APIVersion:"kubefledged.io/v1alpha2", ResourceVersion:"131126536", FieldPath:""}): type: 'Normal' reason: 'ImageCacheUpdate' All cached images succesfully deleted from respective nodes
^C
eechens@EMB-Q6BUMD6N kube-fledged % kubectl get nodes
NAME STATUS ROLES AGE VERSION
aks-si03c8m32-81246184-vmss000000 Ready agent 347d v1.18.14
aks-si03c8m32-81246184-vmss000009 Ready agent 135d v1.18.14
eechens@EMB-Q6BUMD6N kube-fledged % kubectl get ic imagecache1 -o yaml
apiVersion: kubefledged.io/v1alpha2
kind: ImageCache
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"kubefledged.io/v1alpha2","kind":"ImageCache","metadata":{"annotations":{},"labels":{"app":"kubefledged","component":"imagecache"},"name":"imagecache1","namespace":"kube-fledged"},"spec":{"cacheSpec":[{"images":["quay.io/bitnami/nginx:1.21.1","quay.io/bitnami/tomcat:10.0.8"]},{"images":["quay.io/bitnami/redis:6.2.5","quay.io/bitnami/mariadb:10.5.11"],"nodeSelector":{"tier":"backend"}}],"imagePullSecrets":[{"name":"myregistrykey"}]}}
creationTimestamp: "2021-08-13T15:27:34Z"
generation: 6
labels:
app: kubefledged
component: imagecache
managedFields:
- apiVersion: kubefledged.io/v1alpha2
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:kubectl.kubernetes.io/last-applied-configuration: {}
f:labels:
.: {}
f:app: {}
f:component: {}
f:spec:
.: {}
f:imagePullSecrets: {}
manager: kubectl-client-side-apply
operation: Update
time: "2021-08-13T15:27:34Z"
- apiVersion: kubefledged.io/v1alpha2
fieldsType: FieldsV1
fieldsV1:
f:spec:
f:cacheSpec: {}
manager: kubectl-edit
operation: Update
time: "2021-08-13T15:29:08Z"
- apiVersion: kubefledged.io/v1alpha2
fieldsType: FieldsV1
fieldsV1:
f:status:
.: {}
f:completionTime: {}
f:message: {}
f:reason: {}
f:startTime: {}
f:status: {}
manager: kubefledged-controller
operation: Update
time: "2021-08-13T15:29:12Z"
name: imagecache1
namespace: kube-fledged
resourceVersion: "131126650"
selfLink: /apis/kubefledged.io/v1alpha2/namespaces/kube-fledged/imagecaches/imagecache1
uid: 80953422-7060-418f-a6ad-c24b403010b1
spec:
cacheSpec:
- images:
- quay.io/bitnami/nginx:1.21.1
- images:
- quay.io/bitnami/redis:6.2.5
- quay.io/bitnami/mariadb:10.5.11
nodeSelector:
tier: backend
imagePullSecrets:
- name: myregistrykey
status:
completionTime: "2021-08-13T15:29:12Z"
message: All cached images succesfully deleted from respective nodes
reason: ImageCacheUpdate
startTime: "2021-08-13T15:29:08Z"
status: Succeeded
eechens@EMB-Q6BUMD6N kube-fledged %
From the logs you pasted, I can make out the image deletion job did not get completed and got expired after the default 5 minutes. The image manager in kube-fledged will fetch the corresponding Pod to fetch the error message and reason for the failure, but the fetch failed. This can happen when the informer cache used in the image manager is not in sync with the persisted state in etcd...
After you restarted the controller, the informer cache would be created anew. Pls. repeat the modify operation again and share the status and logs
@senthilrch tried again and even re-installed the controlled as well but no luck
Entire Controller Log File:
I0813 17:15:18.805048 1 controller.go:123] Setting up event handlers I0813 17:15:18.808132 1 main.go:76] Starting pre-flight checks I0813 17:15:18.839420 1 controller.go:159] No dangling or stuck jobs found... I0813 17:15:18.904537 1 controller.go:186] No dangling or stuck imagecaches found... I0813 17:15:18.904592 1 main.go:80] Pre-flight checks completed I0813 17:15:18.904624 1 controller.go:224] Starting fledged controller I0813 17:15:18.904635 1 controller.go:227] Waiting for informer caches to sync I0813 17:15:19.005611 1 controller.go:232] Starting image cache worker I0813 17:15:19.005663 1 controller.go:239] Starting cache refresh worker I0813 17:15:19.005672 1 controller.go:243] Started workers I0813 17:15:19.005691 1 image_manager.go:341] Starting image manager I0813 17:15:19.005698 1 image_manager.go:344] Waiting for informer caches to sync I0813 17:15:19.106103 1 image_manager.go:349] Started image manager I0813 17:16:29.495586 1 controller.go:430] Starting to sync image cache imagecache1(create) I0813 17:16:29.580177 1 controller.go:633] Completed sync actions for image cache imagecache1(create) I0813 17:16:29.642794 1 image_manager.go:428] Job imagecache1-mgldl created (pull:- quay.io/bitnami/nginx:1.21.1 --> kube-az1-master1, runtime: docker://19.3.15) I0813 17:16:29.656380 1 image_manager.go:428] Job imagecache1-nckdf created (pull:- quay.io/bitnami/tomcat:10.0.8 --> kube-az1-master1, runtime: docker://19.3.15) I0813 17:16:29.674758 1 image_manager.go:428] Job imagecache1-vxg87 created (pull:- quay.io/bitnami/nginx:1.21.1 --> kube-az2-master1, runtime: docker://19.3.15) I0813 17:16:29.688813 1 image_manager.go:428] Job imagecache1-xd69g created (pull:- quay.io/bitnami/tomcat:10.0.8 --> kube-az2-master1, runtime: docker://19.3.15) I0813 17:16:29.729301 1 image_manager.go:428] Job imagecache1-xmkpb created (pull:- quay.io/bitnami/nginx:1.21.1 --> kube-az1-master2, runtime: docker://19.3.15) I0813 17:16:29.770681 1 image_manager.go:428] Job imagecache1-m4ng4 created (pull:- quay.io/bitnami/tomcat:10.0.8 --> kube-az1-master2, runtime: docker://19.3.15) I0813 17:16:29.795186 1 image_manager.go:428] Job imagecache1-x2n24 created (pull:- quay.io/bitnami/nginx:1.21.1 --> kube-az1-worker1, runtime: docker://19.3.15) I0813 17:16:29.845655 1 image_manager.go:428] Job imagecache1-nssts created (pull:- quay.io/bitnami/tomcat:10.0.8 --> kube-az1-worker1, runtime: docker://19.3.15) I0813 17:16:29.895609 1 image_manager.go:428] Job imagecache1-54dlf created (pull:- quay.io/bitnami/nginx:1.21.1 --> kube-az2-worker2, runtime: docker://19.3.15) I0813 17:16:29.920791 1 image_manager.go:428] Job imagecache1-qqjzc created (pull:- quay.io/bitnami/tomcat:10.0.8 --> kube-az2-worker2, runtime: docker://19.3.15) I0813 17:16:29.941216 1 image_manager.go:428] Job imagecache1-lbtcz created (pull:- quay.io/bitnami/nginx:1.21.1 --> kube-az2-worker1, runtime: docker://19.3.15) I0813 17:16:30.012574 1 image_manager.go:428] Job imagecache1-xhsp4 created (pull:- quay.io/bitnami/tomcat:10.0.8 --> kube-az2-worker1, runtime: docker://19.3.15) I0813 17:16:30.200777 1 image_manager.go:428] Job imagecache1-stn65 created (pull:- quay.io/bitnami/nginx:1.21.1 --> kube-az1-worker2, runtime: docker://19.3.15) I0813 17:16:30.437273 1 image_manager.go:428] Job imagecache1-z6tdq created (pull:- quay.io/bitnami/tomcat:10.0.8 --> kube-az1-worker2, runtime: docker://19.3.15) I0813 17:16:41.351850 1 image_manager.go:179] Job imagecache1-vxg87 succeeded (pull:- quay.io/bitnami/nginx:1.21.1 --> kube-az2-master1, runtime: docker://19.3.15) I0813 17:16:41.528928 1 image_manager.go:179] Job imagecache1-54dlf succeeded (pull:- quay.io/bitnami/nginx:1.21.1 --> kube-az2-worker2, runtime: docker://19.3.15) I0813 17:16:41.631971 1 image_manager.go:179] Job imagecache1-xmkpb succeeded (pull:- quay.io/bitnami/nginx:1.21.1 --> kube-az1-master2, runtime: docker://19.3.15) I0813 17:16:41.673425 1 image_manager.go:179] Job imagecache1-x2n24 succeeded (pull:- quay.io/bitnami/nginx:1.21.1 --> kube-az1-worker1, runtime: docker://19.3.15) I0813 17:16:42.376309 1 image_manager.go:179] Job imagecache1-stn65 succeeded (pull:- quay.io/bitnami/nginx:1.21.1 --> kube-az1-worker2, runtime: docker://19.3.15) I0813 17:16:58.164376 1 image_manager.go:179] Job imagecache1-nckdf succeeded (pull:- quay.io/bitnami/tomcat:10.0.8 --> kube-az1-master1, runtime: docker://19.3.15) I0813 17:17:01.343181 1 image_manager.go:179] Job imagecache1-mgldl succeeded (pull:- quay.io/bitnami/nginx:1.21.1 --> kube-az1-master1, runtime: docker://19.3.15) I0813 17:17:03.024952 1 image_manager.go:179] Job imagecache1-qqjzc succeeded (pull:- quay.io/bitnami/tomcat:10.0.8 --> kube-az2-worker2, runtime: docker://19.3.15) I0813 17:17:04.153229 1 image_manager.go:179] Job imagecache1-m4ng4 succeeded (pull:- quay.io/bitnami/tomcat:10.0.8 --> kube-az1-master2, runtime: docker://19.3.15) I0813 17:17:04.747771 1 image_manager.go:179] Job imagecache1-xd69g succeeded (pull:- quay.io/bitnami/tomcat:10.0.8 --> kube-az1-master1, runtime: docker://19.3.15) I0813 17:17:07.258774 1 image_manager.go:179] Job imagecache1-z6tdq succeeded (pull:- quay.io/bitnami/tomcat:10.0.8 --> kube-az1-worker2, runtime: docker://19.3.15) I0813 17:17:07.441780 1 image_manager.go:179] Job imagecache1-nssts succeeded (pull:- quay.io/bitnami/tomcat:10.0.8 --> kube-az1-worker1, runtime: docker://19.3.15) I0813 17:17:07.522864 1 image_manager.go:179] Job imagecache1-xhsp4 succeeded (pull:- quay.io/bitnami/tomcat:10.0.8 --> kube-az2-worker1, runtime: docker://19.3.15) I0813 17:17:10.648796 1 image_manager.go:179] Job imagecache1-lbtcz succeeded (pull:- quay.io/bitnami/nginx:1.21.1 --> kube-az2-worker1, runtime: docker://19.3.15) I0813 17:17:12.260489 1 controller.go:430] Starting to sync image cache imagecache1(statusupdate) I0813 17:17:12.292936 1 controller.go:633] Completed sync actions for image cache imagecache1(statusupdate) I0813 17:17:12.293071 1 event.go:282] Event(v1.ObjectReference{Kind:"ImageCache", Namespace:"kube-fledged", Name:"imagecache1", UID:"3de5fc51-760d-43b9-8c1f-5054657e8429", APIVersion:"kubefledged.io/v1alpha2", ResourceVersion:"16118630", FieldPath:""}): type: 'Normal' reason: 'ImageCacheCreate' All requested images pulled succesfully to respective nodes I0813 17:17:57.924301 1 controller.go:430] Starting to sync image cache imagecache1(update) I0813 17:17:57.958120 1 controller.go:633] Completed sync actions for image cache imagecache1(update) I0813 17:17:57.963851 1 image_manager.go:430] Job not created (image-already-present:- quay.io/bitnami/nginx:1.21.1 --> kube-az1-worker1, runtime: docker://19.3.15) I0813 17:17:57.982594 1 image_manager.go:415] Job imagecache1-zwb77 created (delete:- quay.io/bitnami/tomcat:10.0.8 --> kube-az1-worker1, runtime: docker://19.3.15) I0813 17:17:57.982717 1 image_manager.go:430] Job not created (image-already-present:- quay.io/bitnami/nginx:1.21.1 --> kube-az2--worker2, runtime: docker://19.3.15) I0813 17:17:57.996390 1 image_manager.go:415] Job imagecache1-b48p6 created (delete:- quay.io/bitnami/tomcat:10.0.8 --> kube-az2-worker2, runtime: docker://19.3.15) I0813 17:17:57.996531 1 image_manager.go:430] Job not created (image-already-present:- quay.io/bitnami/nginx:1.21.1 --> kube-az2-worker1, runtime: docker://19.3.15) I0813 17:17:58.016100 1 image_manager.go:415] Job imagecache1-q4pjk created (delete:- quay.io/bitnami/tomcat:10.0.8 --> kube-az2-worker1, runtime: docker://19.3.15) I0813 17:17:58.016238 1 image_manager.go:430] Job not created (image-already-present:- quay.io/bitnami/nginx:1.21.1 --> kube-az1-worker2, runtime: docker://19.3.15) I0813 17:17:58.033812 1 image_manager.go:415] Job imagecache1-ssqwp created (delete:- quay.io/bitnami/tomcat:10.0.8 --> kube-az1-worker2, runtime: docker://19.3.15) I0813 17:17:58.033884 1 image_manager.go:430] Job not created (image-already-present:- quay.io/bitnami/nginx:1.21.1 --> kube-az1-master1, runtime: docker://19.3.15) I0813 17:17:58.049588 1 image_manager.go:415] Job imagecache1-lgcrt created (delete:- quay.io/bitnami/tomcat:10.0.8 --> kube-az1-master1, runtime: docker://19.3.15) I0813 17:17:58.049688 1 image_manager.go:430] Job not created (image-already-present:- quay.io/bitnami/nginx:1.21.1 --> kube-az2-master1, runtime: docker://19.3.15) I0813 17:17:58.068555 1 image_manager.go:415] Job imagecache1-mv965 created (delete:- quay.io/bitnami/tomcat:10.0.8 --> kube-az2-master1, runtime: docker://19.3.15) I0813 17:17:58.068681 1 image_manager.go:430] Job not created (image-already-present:- quay.io/bitnami/nginx:1.21.1 --> kube-az1-master2, runtime: docker://19.3.15) I0813 17:17:58.080138 1 image_manager.go:415] Job imagecache1-sd8xx created (delete:- quay.io/bitnami/tomcat:10.0.8 --> kube-az1-master2, runtime: docker://19.3.15) E0813 17:22:58.081364 1 image_manager.go:212] No pods matched job imagecache1-lgcrt E0813 17:22:58.084083 1 image_manager.go:286] Error from updatePendingImageWorkResults(): no pods matched job imagecache1-lgcrt
ImageCache CR file:
apiVersion: kubefledged.io/v1alpha2 kind: ImageCache metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"kubefledged.io/v1alpha2","kind":"ImageCache","metadata":{"annotations":{},"labels":{"app":"kubefledged","component":"imagecache"},"name":"imagecache1","namespace":"kube-fledged"},"spec":{"cacheSpec":[{"images":["quay.io/bitnami/nginx:1.21.1"]}],"imagePullSecrets":[{"name":"myregistrykey"}]}} creationTimestamp: "2021-08-13T17:16:29Z" generation: 5 labels: app: kubefledged component: imagecache managedFields:
@senthilrch I have removed one of the image from the imagecache cr and controller log shows jobs created for delete but I am not able to see the jobs created in the namespace but the same thing I am able to k8s batch jobs for addition(adding new image). Can I get the delete image job(action: delete) manifest so that I can try running k8s batch job and see whether image is getting deleted or not.
@reddymh:
apiVersion: batch/v1
kind: Job
metadata:
creationTimestamp: "2021-08-14T06:18:22Z"
generateName: imagecache1-
labels:
app: imagecache
controller: fledged
imagecache: imagecache1
managedFields:
- apiVersion: batch/v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:generateName: {}
f:labels:
.: {}
f:app: {}
f:controller: {}
f:imagecache: {}
f:ownerReferences:
.: {}
k:{"uid":"80953422-7060-418f-a6ad-c24b403010b1"}:
.: {}
f:apiVersion: {}
f:blockOwnerDeletion: {}
f:controller: {}
f:kind: {}
f:name: {}
f:uid: {}
f:spec:
f:activeDeadlineSeconds: {}
f:backoffLimit: {}
f:completions: {}
f:parallelism: {}
f:template:
f:metadata:
f:labels:
.: {}
f:app: {}
f:controller: {}
f:imagecache: {}
f:namespace: {}
f:spec:
f:containers:
k:{"name":"docker-cri-client"}:
.: {}
f:args: {}
f:command: {}
f:image: {}
f:imagePullPolicy: {}
f:name: {}
f:resources: {}
f:terminationMessagePath: {}
f:terminationMessagePolicy: {}
f:volumeMounts:
.: {}
k:{"mountPath":"/var/run/docker.sock"}:
.: {}
f:mountPath: {}
f:name: {}
f:dnsPolicy: {}
f:imagePullSecrets:
.: {}
k:{"name":"myregistrykey"}:
.: {}
f:name: {}
f:nodeSelector:
.: {}
f:kubernetes.io/hostname: {}
f:restartPolicy: {}
f:schedulerName: {}
f:securityContext: {}
f:terminationGracePeriodSeconds: {}
f:tolerations: {}
f:volumes:
.: {}
k:{"name":"runtime-sock"}:
.: {}
f:hostPath:
.: {}
f:path: {}
f:type: {}
f:name: {}
manager: kubefledged-controller
operation: Update
time: "2021-08-14T06:18:22Z"
- apiVersion: batch/v1
fieldsType: FieldsV1
fieldsV1:
f:status:
f:completionTime: {}
f:conditions:
.: {}
k:{"type":"Complete"}:
.: {}
f:lastProbeTime: {}
f:lastTransitionTime: {}
f:status: {}
f:type: {}
f:startTime: {}
f:succeeded: {}
manager: kube-controller-manager
operation: Update
time: "2021-08-14T06:18:24Z"
name: imagecache1-5cg7q
namespace: kube-fledged
ownerReferences:
- apiVersion: kubefledged.io/v1alpha2
blockOwnerDeletion: true
controller: true
kind: ImageCache
name: imagecache1
uid: 80953422-7060-418f-a6ad-c24b403010b1
resourceVersion: "131532506"
selfLink: /apis/batch/v1/namespaces/kube-fledged/jobs/imagecache1-5cg7q
uid: 451ab5b9-fa5f-4fdd-b457-55c952d274a8
spec:
activeDeadlineSeconds: 3600
backoffLimit: 0
completions: 1
parallelism: 1
selector:
matchLabels:
controller-uid: 451ab5b9-fa5f-4fdd-b457-55c952d274a8
template:
metadata:
creationTimestamp: null
labels:
app: imagecache
controller: fledged
controller-uid: 451ab5b9-fa5f-4fdd-b457-55c952d274a8
imagecache: imagecache1
job-name: imagecache1-5cg7q
namespace: kube-fledged
spec:
containers:
- args:
- -c
- exec /usr/bin/docker image rm -f quay.io/non-existent-job22:latest > /dev/termination-log
2>&1
command:
- /bin/bash
image: senthilrch/kubefledged-cri-client:v0.8.1
imagePullPolicy: IfNotPresent
name: docker-cri-client
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/docker.sock
name: runtime-sock
dnsPolicy: ClusterFirst
imagePullSecrets:
- name: myregistrykey
nodeSelector:
kubernetes.io/hostname: aks-si03c8m32-81246184-vmss000009
restartPolicy: Never
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
tolerations:
- operator: Exists
volumes:
- hostPath:
path: /var/run/docker.sock
type: Socket
name: runtime-sock
status:
completionTime: "2021-08-14T06:18:24Z"
conditions:
- lastProbeTime: "2021-08-14T06:18:24Z"
lastTransitionTime: "2021-08-14T06:18:24Z"
status: "True"
type: Complete
startTime: "2021-08-14T06:18:22Z"
succeeded: 1
@senthilrch after debugging the issues found the issue and issue is with psp policies.
Error log from delete batch job:
Error creating: pods "imagecache1-cgws2-" is forbidden: PodSecurityPolicy: unable to admit pod: [spec.containers[0].volumeMounts[0].readOnly: Invalid value: false: must be read-only spec.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used]
Job is binding with any service account so that need to add hostpath policy and bind with service account.
@reddymh : thanks for debugging the issue. The image delete pod needs to mount the socket file of the node's container runtime in order to be able to issue the delete command to the container runtime to delete the image on the node.
In kube-fledged we don't set the service account in the Job's pod spec template. So if the default service account of the namespace is linked to a psp policy that doesn't allows hostpath mounts, then deleting the image is not possible.
Any thoughts on how we should adapt kube-fledged so it can work in situations like this? Should we ask users to create a specific service account for kube-fledged's delete job and specify it in ImageCache?
@senthilrch it will good to have service account and map the psp policy(if enabled) having hostpath "/var/run" enabled as read only or have to use the existing service account "kubefledged-controller".
@senthilrch I am testing the delete image by adding the serviceaccount , if it works Can I submit the change via PR ?
@reddymh, yes go ahead and raise PR. Many thanks for your support and time!
On Sun, 15 Aug 2021, 22:39 Rajshekar Reddy, @.***> wrote:
@senthilrch https://github.com/senthilrch I am testing the delete image by adding the serviceaccount , if it works Can I submit the change via PR ?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/senthilrch/kube-fledged/issues/100#issuecomment-899083010, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFZLP4SUUJ6EBTDSLRERAODT47YGLANCNFSM5CDR4OEQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .
new commandline flag --service-account-name
will be added to kubefledged-controller. It's an optional flag. When this flag is present the serviceAccountName specified in commandline will be set inside Job.spec.template.spec.serviceAccountName for every job it creates. This way the image-puller and image-deleter Pods will be created with this serviceAccountName.
When I update the imagecache by removing one of the image in the imagecache CR , jobs will created and will be in dangling state and not deleted and controller pod will go in hanging state.
Logs: Removing the image from the imagecache object or CR:
job will created but no status update after created as well as no response from the controller as well(hanged) and no image will be deleted from the nodes.
I0813 12:08:22.507883 1 controller.go:430] Starting to sync image cache imagecache2(update) I0813 12:08:22.546340 1 controller.go:633] Completed sync actions for image cache imagecache2(update) I0813 12:08:22.552289 1 image_manager.go:430] Job not created (image-already-present:- alpine:3.13.5 -->kube-worker1, runtime: docker://19.3.15) I0813 12:08:22.573488 1 image_manager.go:415] Job imagecache2-z7kss created (delete:- alpine:3.14.1 --> kube-worker1, runtime: docker://19.3.15) I0813 12:08:22.588487 1 image_manager.go:415] Job imagecache2-n8bqx created (delete:- alpine:3.14 --> kube-worker1, runtime: docker://19.3.15) I0813 12:08:22.588677 1 image_manager.go:430] Job not created (image-already-present:- alpine:3.13.5 --> kube-az2-worker2, runtime: docker://19.3.15) I0813 12:08:22.614402 1 image_manager.go:415] Job imagecache2-crtvm created (delete:- alpine:3.14.1 --> kube-az2-worker2, runtime: docker://19.3.15) I0813 12:08:22.627991 1 image_manager.go:415] Job imagecache2-nszgz created (delete:- alpine:3.14 --> kube-az2-worker2, runtime: docker://19.3.15) E0813 12:13:23.364327 1 image_manager.go:212] No pods matched job imagecache2-nszgz E0813 12:13:23.364375 1 image_manager.go:286] Error from updatePendingImageWorkResults(): no pods matched job imagecache2-nszgz
After restarting the controller pod:
I0813 12:30:17.488537 1 controller.go:123] Setting up event handlers I0813 12:30:17.490327 1 main.go:76] Starting pre-flight checks I0813 12:30:17.559958 1 controller.go:170] Dangling Job(imagecache2-75q8k) deleted I0813 12:30:17.591621 1 controller.go:170] Dangling Job(imagecache2-9tl22) deleted I0813 12:30:17.626065 1 controller.go:170] Dangling Job(imagecache2-crtvm) deleted I0813 12:30:17.648474 1 controller.go:170] Dangling Job(imagecache2-g9vt7) deleted I0813 12:30:17.669925 1 controller.go:170] Dangling Job(imagecache2-hgwzb) deleted I0813 12:30:17.691915 1 controller.go:170] Dangling Job(imagecache2-k2btp) deleted I0813 12:30:17.708962 1 controller.go:170] Dangling Job(imagecache2-lrhp4) deleted I0813 12:30:17.723610 1 controller.go:170] Dangling Job(imagecache2-lxwbs) deleted I0813 12:30:17.740635 1 controller.go:170] Dangling Job(imagecache2-n8bqx) deleted I0813 12:30:17.755842 1 controller.go:170] Dangling Job(imagecache2-nszgz) deleted I0813 12:30:17.908447 1 controller.go:170] Dangling Job(imagecache2-q54g2) deleted I0813 12:30:18.108218 1 controller.go:170] Dangling Job(imagecache2-z7kss) deleted I0813 12:30:18.308094 1 controller.go:170] Dangling Job(imagecache2-zq48b) deleted I0813 12:30:18.507513 1 controller.go:170] Dangling Job(imagecache2-zzz84) deleted I0813 12:30:18.567621 1 controller.go:204] Dangling Image cache(imagecache2) status changed to 'Aborted' I0813 12:30:18.567725 1 main.go:80] Pre-flight checks completed