rancher / rke2

https://docs.rke2.io/
Apache License 2.0
1.57k stars 268 forks source link

RKE2 tries to start `containerd` but it may already be started, causing rke2 to fail to start #6853

Closed bendemott closed 1 month ago

bendemott commented 1 month ago

Environmental Info: RKE2 Version: 1.27 -> 1.31 (tested)

Node(s) CPU architecture, OS, and Version:

Cluster Configuration:

Describe the bug: Some installations of containerd will register a service (services.msc) If containerd is started before rke2 as a service, rke2 fails to start with the error:

time="2024-09-13T16:58:17-07:00" level=error msg="containerd exited: exit status 1"

The cause of the error can be found in containerd log at: C:\var\lib\rancher\rke2\agent\containerd\containerd.log

containerd: failed to get listener for main ttrpc endpoint: open //./pipe/containerd-containerd.ttrpc: Access is denied.

Disabling the containerd service will prevent this conflict from happening.

Steps To Reproduce:

Expected behavior: It can be difficult in windows to detect if a named pipe is in use.

using ctr.exe will timeout if containerd is NOT running.

PS C:\Users\Administrator> ctr --connect-timeout=5s images ls
ctr: failed to dial "\\\\.\\pipe\\containerd-containerd": context deadline exceeded: connection error: desc = "transport: error while dialing: dial \\\\.\\pipe\\containerd-containerd: timeout"

Actual behavior: rke2 fails to start with unhelpful message

Additional context / logs: happy to add full logs if this description is not enough.

Workaround

unregister the containerd service, or disable it

    containerd.exe --unregister-service
brandond commented 1 month ago

Don't do that. Don't run rke2 alongside an existing installation of containerd, or if you must, point rke2 at that existing service's socket via the --container-runtime-endpoint option so that it does not try to start the bundled containerd.

bendemott commented 1 month ago

@brandond

There are many reasons RKE2 may fail to start containerd In any case, a better message than the following would be useful: (a hint to the log file location of containerd)

time="2024-09-13T16:58:17-07:00" level=error msg="containerd exited: exit status 1"

Also, my error was not caused by having a "separate" installation of containerd from RKE2.

It's that I unknowing called containerd.exe --register-service - this registered the RKE2 containerd as a service.

brandond commented 1 month ago

It's that I unknowing called containerd.exe --register-service

Why did you do that? You are not intended to manually run RKE2's bundled containerd at all.