senthilrch / kube-fledged

A kubernetes operator for creating and managing a cache of container images directly on the cluster worker nodes, so application pods start almost instantly
Apache License 2.0
1.24k stars 118 forks source link

Image pull jobs failing: "standard_init_linux.go:228: exec user process caused: no such file or directory" #176

Closed dstndstn closed 1 year ago

dstndstn commented 2 years ago

Hi,

I'm not sure I'm supposed to be asking a support question here, and I am very new to Kubernetes and kube-fledged, so will appreciate your patience, and apologies if this is foolish...

I'm running a Google Kubernetes Engine cluster, and trying to pre-pull some (~5 GB) images on nodes upon startup, so that subsequent jobs launch quickly.

I am finding that all the image-pull jobs are failing. All I can find in the logs (in the Google Cloud dashboard) for a failed job is

Job imagecache1-272c4:

2022-06-03 11:31:14.910 EDT
standard_init_linux.go:228: exec user process caused: no such file or directory

In the kubefledged-controller logs I see:

I0603 15:31:11.820453       1 image_manager.go:469] Job imagecache1-272c4 created (pull:- docker.io/einsteintoolkit/et-workshop --> gke-cluster-gpu-pool-5f741b22-u479, runtime: containerd://1.5.4)
I0603 15:31:17.928191       1 image_manager.go:217] Job imagecache1-272c4 failed (pull: docker.io/einsteintoolkit/et-workshop --> gke-cluster-gpu-pool-5f741b22-u479)

I'm pretty lost about what's going on, so any pointers would be appreciated. Thanks!

senthilrch commented 1 year ago

This error is caused due to kube-fledged using the busybox image from gcr.io. This image has a dynamically linked echo binary, whereas kube-fledged requires a statically linked one.

This issue will be resolved in v0.10.0 by using the busybox image from dockerhub.