senthilrch / kube-fledged

A kubernetes operator for creating and managing a cache of container images directly on the cluster worker nodes, so application pods start almost instantly
Apache License 2.0
1.26k stars 119 forks source link

Failing to pull image #205

Closed dgodfrey206 closed 1 year ago

dgodfrey206 commented 1 year ago

I am having an issue where it's failing to pull the image into the image cache.

I have an image at docker.io/dgodfrey206/palatialsps:demo that I want cached to all nodes in the cluster. I followed the README and added the tag under "images" with no nodeSelectorand when I create the image cache it then takes 5 minutes and says that the image pulling failed.

This is the output I got:

"status": {
    "completionTime": "2023-02-23T05:55:08Z",
    "failures": {
        "docker.io/dgodfrey206/palatialsps:demo": [
            {
                "message": "Check if node is ready",
                "node": "fake-geda720",
                "reason": "Pending"
            },
            {
                "message": "Check if node is ready",
                "node": "fake-g7adc10",
                "reason": "Pending"
            },
            {
                "message": "Check if node is ready",
                "node": "fake-gb36f0a",
                "reason": "Pending"
            },
            {
                "message": "Check if node is ready",
                "node": "fake-gedb470",
                "reason": "Pending"
            }
        ]
    }
}

I ran kubectl get pods -n kube-fledged to get the pods:

NAME                                          READY   STATUS    RESTARTS   AGE
kubefledged-controller-7d546dc6d8-xk2fz       1/1     Running   0          7m
kubefledged-webhook-server-7f5cc6c796-8zvwk   1/1     Running   0          6m53s
imagecache1-2djv4-mnr5l                       0/1     Pending   0          102s
imagecache1-psb5s-5xdm5                       0/1     Pending   0          102s
imagecache1-x5vbd-flqjv                       0/1     Pending   0          102s

When I run kubectl describe pod imagecache1-2djv4-mnr5l -n kube-fledged for one of the imagecache pods and go down to the "Events" portion of the output I saw this:

Events:
  Type      Reason            Age           From           Message
  ----      ------           ----           ----           -------
  Warning  SyncError     40s (x5 over 60s)  pod-syncer     Error syncing to physical cluster: admission webhook "validate.kyverno.svc-ignore" denied the request:

resource Pod/tenant-palatial-platform/imagecache1-sbmt5-z2kv7-x-kube-fledged-x-vcluster-app was blocked due to the following policies

restrict-tolerations:
  restrict-tolerations: 'validation error: Pods may not use restricted toleration
    of "node.kubernetes.io/unschedulable". Rule restrict-tolerations failed at path
    /spec/tolerations/0/key/'

I should mention I'm using a vcluster control plane.

dgodfrey206 commented 1 year ago

It appears that this was an issue specific to Coreweave Kubernetes. Coreweave is the cloud platform I'm using. It seems kube-fledged requires access to the nodes which Coreweave does not allow. An alternative to kube-fledged would be Nydus. I'll close this issue.