spegel-org / spegel

Stateless cluster local OCI registry mirror.
MIT License
1.14k stars 57 forks source link

always has sevaral pods slow pull images #530

Open sonyafenge opened 3 months ago

sonyafenge commented 3 months ago

Spegel version

v0.0.18

Kubernetes distribution

kubeadm

Kubernetes version

v1.30

CNI

Calico

Describe the bug

we are running kubernetes cluster on baremental machines using capi. spegel is deployed and running successfully, but always has several node (randomly) slow pulling images, didn't see any error logs from spegel except 500 error.

{"level":"info","ts":1719612847.9669137,"caller":"registry/registry.go:218","msg":"handling mirror request from external node","key":"sha256:0e83cb3d308a5b8f448567868a41ae0f406289fc0ed323 │
│ 334ff0cc7ac4d8734b","path":"/v2/ise/ccppday1/thorium-operator/blobs/sha256:0e83cb3d308a5b8f448567868a41ae0f406289fc0ed323334ff0cc7ac4d8734b","ip":"10.245.55.192"}                          │
│ {"level":"error","ts":1719612847.9737165,"caller":"gin@v0.0.9/logger.go:62","msg":"","path":"/v2/ise/ccppday1/thorium-operator/blobs/sha256:0e83cb3d308a5b8f448567868a41ae0f406289fc0ed3233 │
│ 34ff0cc7ac4d8734b","status":500,"method":"GET","latency":0.006881913,"ip":"10.245.55.192","handler":"mirror","error":"mirror resolve retries exhausted for key: sha256:0e83cb3d308a5b8f4485 │
│ 67868a41ae0f406289fc0ed323334ff0cc7ac4d8734b","stacktrace":"github.com/xenitab/pkg/gin.NewEngine.Logger.func1\n\t/go/pkg/mod/github.com/xenitab/pkg/gin@v0.0.9/logger.go:62\ngithub.com/gin │
│ -gonic/gin.(*Context).Next\n\t/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174\ngithub.com/gin-gonic/gin.(*Engine).handleHTTPRequest\n\t/go/pkg/mod/github.com/gin-gonic/gin@v1.9 │
│ .1/gin.go:620\ngithub.com/gin-gonic/gin.(*Engine).ServeHTTP\n\t/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/gin.go:576\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/serve │
│ r.go:2938\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:2009"}
image
phillebaba commented 3 months ago

Could you update to the latest version of Spegel and see if the problem persists?

sonyafenge commented 2 months ago

will try to test it with latest version and update

Calotte commented 1 month ago

It seems v0.0.23 also have this issue, always serval job failed to pull. I observed the following error in containerd log: time="2024-08-20T17:27:12.187740008Z" level=error msg="cancel pulling image mcr.microsoft.com/azureml/runtime/boot/installed:0.0.1.20240813.2 because of no progress in 5m0s" It looks like network issue but after disable spegel's registries this error no longer occurred, interesting @phillebaba Do you have some suggestions to investigate this issue?