spegel-org / spegel

Stateless cluster local OCI registry mirror.
MIT License
1.08k stars 55 forks source link

New joined nodes has error: "run/containerd/containerd.sock: connect: connection refused" #528

Open sonyafenge opened 2 months ago

sonyafenge commented 2 months ago

Spegel version

v0.0.18

Kubernetes distribution

kubeadm

Kubernetes version

v1.30

CNI

calico

Describe the bug

we are running kubernetes cluster on baremental machines using capi. I found any new joined nodes after spegel installation will have the error below and not function for any mirror.

{"level":"info","ts":1719427282.4200914,"caller":"state/state.go:30","msg":"running scheduled image state update"}
{"level":"error","ts":1719427282.4204097,"caller":"state/state.go:32","msg":"received errors when updating all images","error":"connection error: desc = \"transport: error while dialing: dial unix /run/containerd/containerd.sock: connect: connection refused\": unavailable","stacktrace":"github.com/xenitab/spegel/pkg/state.Track\n\t/build/pkg/state/state.go:32\nmain.registryCommand.func5\n\t/build/main.go:172\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.6.0/errgroup/errgroup.go:78"}
sonyafenge commented 2 months ago

looks like same as: https://github.com/spegel-org/spegel/issues/333, not sure if that related leader election refresh.

phillebaba commented 2 months ago

This has nothing to do with #333. The error you are seeing comes from the Containerd client not being able to communicate with the Containerd socket. Are you sure the socket is located at the path that is configured?

sonyafenge commented 2 months ago

yes, I am sure the socket is located at the path. everytime after I restart spegel daemonset, the issue was fixed.

phillebaba commented 1 month ago

This seems like a peculiar issue as restarting the pod should have no effect. And you are seeing the same issue with the latest Spegel version?

sonyafenge commented 1 month ago

still don't have a chance to test with the latest Spegel version, hopefully i can get it tested next week.

On Thu, Jul 11, 2024 at 2:25 PM Philip Laine @.***> wrote:

This seems like a peculiar issue as restarting the pod should have no effect. And you are seeing the same issue with the latest Spegel version?

— Reply to this email directly, view it on GitHub https://github.com/spegel-org/spegel/issues/528#issuecomment-2223964893, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK25LWOH6UA23HEESMFO7PLZL3Z3VAVCNFSM6AAAAABKA4CVZ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRTHE3DIOBZGM . You are receiving this because you authored the thread.Message ID: @.***>

freym commented 1 month ago

We have enabled selinux on our nodes and had the same error. The solution was the following setting:

securityContext:
  seLinuxOptions:
    type: spc_t
jurim76 commented 3 weeks ago

I have the same issue on control-plane nodes, regardless by restart

{"time":"2024-08-16T10:56:12.574167036Z","level":"ERROR","source":{"function":"github.com/spegel-org/spegel/pkg/state.Track","file":"/build/pkg/state/state.go","line":36},"msg":"received errors when updating all images","err":"connection error: desc = \"transport: error while dialing: dial unix /run/containerd/containerd.sock: connect: connection refused\": unavailable"}
phillebaba commented 1 week ago

The error that you are seeing means that either the Containerd socket does not exist at that path or it can't be reached. This check is run immediately on start and will exit Spegel if an error occurs as there would be no use continuing. Are you sure that this is the correct path?

jurim76 commented 1 week ago

Could not produce this issue anymore, please ignore