Open sonyafenge opened 4 months ago
Could you update to the latest version of Spegel and see if the problem persists?
will try to test it with latest version and update
It seems v0.0.23 also have this issue, always serval job failed to pull. I observed the following error in containerd log: time="2024-08-20T17:27:12.187740008Z" level=error msg="cancel pulling image mcr.microsoft.com/azureml/runtime/boot/installed:0.0.1.20240813.2 because of no progress in 5m0s" It looks like network issue but after disable spegel's registries this error no longer occurred, interesting @phillebaba Do you have some suggestions to investigate this issue?
Spegel version
v0.0.18
Kubernetes distribution
kubeadm
Kubernetes version
v1.30
CNI
Calico
Describe the bug
we are running kubernetes cluster on baremental machines using capi. spegel is deployed and running successfully, but always has several node (randomly) slow pulling images, didn't see any error logs from spegel except 500 error.