Closed kingsd041 closed 3 years ago
Reproduced on 7.8 using the following steps:
1. curl https://releases.rancher.com/install-docker/19.03.sh | sh
2. systemctl enable docker (and disable firewalld if that's enabled)
3. sudo setenforce 0
4. sudo curl -sfL https://get.k3s.io | INSTALL_K3S_SKIP_START=true INSTALL_K3S_EXEC="server --disable traefik --docker" sh
5. Add "Wants=docker.service", "After=docker.service" to /etc/systemd/system/k3s.service
6. sudo systemctl daemon-reload
7. sudo systemctl start k3s
Environment Info:
[root@test-centos-k3s-docker ~]# rpm -q centos-release
centos-release-7-8.2003.0.el7.centos.x86_64
[root@test-centos-k3s-docker ~]# uname -a
Linux test-centos-k3s-docker 3.10.0-1127.el7.x86_64
Log:
Nov 18 19:27:14 test-centos-k3s-docker k3s[4588]: I1118 19:27:14.806232 4588 plugin_manager.go:114] Starting Kubelet Plugin Manager
Nov 18 19:27:14 test-centos-k3s-docker k3s[4588]: E1118 19:27:14.808811 4588 eviction_manager.go:260] eviction manager: failed to get summary stats: failed to get node info: node "test-centos-k3s-docker" not found
Nov 18 19:27:14 test-centos-k3s-docker k3s[4588]: time="2020-11-18T19:27:14.831356438Z" level=info msg="Proxy done" err="context canceled" url="wss://127.0.0.1:6443/v1-k3s/connect"
Nov 18 19:27:14 test-centos-k3s-docker k3s[4588]: time="2020-11-18T19:27:14.832251208Z" level=info msg="Kube API server is now running"
Nov 18 19:27:14 test-centos-k3s-docker k3s[4588]: time="2020-11-18T19:27:14.832299870Z" level=info msg="k3s is up and running"
Nov 18 19:27:14 test-centos-k3s-docker k3s[4588]: Flag --address has been deprecated, see --bind-address instead.
Nov 18 19:27:14 test-centos-k3s-docker k3s[4588]: time="2020-11-18T19:27:14.837703225Z" level=info msg="error in remotedialer server [400]: websocket: close 1006 (abnormal closure): unexpected EOF"
Nov 18 19:27:14 test-centos-k3s-docker k3s[4588]: time="2020-11-18T19:27:14.838636951Z" level=fatal msg="server stopped: http: Server closed"
Nov 18 19:27:14 test-centos-k3s-docker systemd[1]: k3s.service: main process exited, code=exited, status=1/FAILURE
Nov 18 19:27:14 test-centos-k3s-docker systemd[1]: Failed to start Lightweight Kubernetes.
Nov 18 19:27:14 test-centos-k3s-docker systemd[1]: Unit k3s.service entered failed state.
Nov 18 19:27:14 test-centos-k3s-docker systemd[1]: k3s.service failed.
Nov 18 19:27:16 test-centos-k3s-docker systemd[1]: Stopped Lightweight Kubernetes.
Full log:
Changing the systemd unit from type=notify
to type=exec
allows it to start properly. I think there may be some issue with the notification socket. Server also runs an agent by default, and they both attempt to send a notification:
https://github.com/rancher/k3s/blob/master/pkg/agent/run.go#L168 https://github.com/rancher/k3s/blob/master/pkg/cli/server/server.go#L251
If I start k3s with --disable-agent
, it starts successfully.
I suspect there is some sort of timing issue, where k3s starts up faster when using docker (since it doesn't have to start containerd) and the duplicate notifications are confusing systemd. This is a hunch; I would have to test a build with the agent SdNotify call disabled in order to confirm this.
Hitting the issue with --docker, CentOS 7.9, k3s 1.19.3-k3s3. Switching to type=exec seems to have mitigated it. (had an issue earlier due to error between keyboard and chair)
The issue appears to not affect the 1.18 releases, having tested both 1.18.9+k3s1, 1.18.12+k3s1. While effecting both 1.19.4-rc1+k3s1 and 1.19.3-k3s3.
I am honestly not sure why server needs to notify separately - it feels like it should only need to do that if we're running with --disable-agent
since in that case the agent would not notify.
The notify race seems to be a red herring: It's only called once during a failed run:
[root@test-centos-k3s-docker ~]# sudo sysdig -s812 "(fd.type in (unix) and proc.name in (k3s-server))"|grep "notify" -A1 -B1
264402 15:45:30.014031936 1 k3s-server (18483) > connect fd=7(<u>)
264403 15:45:30.014553118 1 k3s-server (18483) < connect res=0 tuple=ffff8df799600440->ffff8df797ddd100 /run/systemd/notify
264410 15:45:30.014576714 1 k3s-server (18483) > write fd=7(<u>ffff8df799600440->ffff8df797ddd100 /run/systemd/notify) size=8
264411 15:45:30.014583567 1 k3s-server (18483) < write res=8 data=READY=1.
264414 15:45:30.014588274 1 k3s-server (18483) > close fd=7(<u>ffff8df799600440->ffff8df797ddd100 /run/systemd/notify)
264415 15:45:30.014588997 1 k3s-server (18483) < close res=0
(SERVICE EXITS)
I wonder if this is a side-effect of using TimeoutStartSec=0
in the systemd unit file.
Something is cancelling the tunnel server context, I was assuming that systemd was sending a signal to the k3s process in response to the duplicate READY messages that was causing it to start shutting down. Can you check for that?
This appears to be a systemd
issue. On a system running systemd-219-42.el7_4.4
, I can reliably get k3s v1.19.3+k3s3
to start with docker. As soon as I updated to systemd-219-78.el7_9.2
k3s
was no longer able to start
I walked through all of the various systemd
packages available for CentOS 7
and it started failing once I hit systemd-219-73.el7.1.x86_64
which is the first `systemd-219-73
package available.
There is nothing wrong with k3s until systemd decides to kill it's main process immediately after determining that the services "cgroup is empty".
Setting KillMode=none
in the unit file will still result in the service to be marked failed but the k3s process will continue to just run fine.
@janeczku Yeah, as far as I can tell systemd
>= 219-73
is killing the process for some reason. Still investigating this. I also tested by changing the Kill Signal to SIGABRT
which promptly made k3s
create a stack trace
I'm not so sure that the cgorup empty notification
message is meaningful -- on 1.18.12+k3s2
(which works) it still shows up:
Nov 20 17:42:02 ip-172-31-34-64.us-west-2.compute.internal k3s[20388]: I1120 17:42:02.938030 20388 plugin_manager.go:114] Starting Kubelet Plugin Manager
Nov 20 17:42:02 ip-172-31-34-64.us-west-2.compute.internal systemd[1]: Got cgroup empty notification for: /system.slice/k3s.service
Nov 20 17:42:02 ip-172-31-34-64.us-west-2.compute.internal systemd[1]: k3s.service: cgroup is empty
Nov 20 17:42:02 ip-172-31-34-64.us-west-2.compute.internal systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1/agent interface=org.freedesktop.systemd1.Agent member=Released cookie=299579 reply_cookie=0 error=n/a
Nov 20 17:42:02 ip-172-31-34-64.us-west-2.compute.internal systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1/unit/k3s_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=299580 reply_cookie=0 error=n/a
Nov 20 17:42:02 ip-172-31-34-64.us-west-2.compute.internal systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1/unit/k3s_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=299581 reply_cookie=0 error=n/a
Nov 20 17:42:02 ip-172-31-34-64.us-west-2.compute.internal k3s[20388]: I1120 17:42:02.975511 20388 node_ipam_controller.go:94] Sending events to api server.
Nov 20 17:42:03 ip-172-31-34-64.us-west-2.compute.internal k3s[20388]: I1120 17:42:03.423677 20388 reconciler.go:157] Reconciler: start to sync state
Nov 20 17:42:03 ip-172-31-34-64.us-west-2.compute.internal k3s[20388]: I1120 17:42:03.478050 20388 node.go:136] Successfully retrieved node IP: 172.31.34.64
Nov 20 17:42:03 ip-172-31-34-64.us-west-2.compute.internal k3s[20388]: I1120 17:42:03.478083 20388 server_others.go:187] Using iptables Proxier.
Nov 20 17:42:03 ip-172-31-34-64.us-west-2.compute.internal k3s[20388]: I1120 17:42:03.478381 20388 server.go:583] Version: v1.18.12+k3s1
Nov 20 17:42:03 ip-172-31-34-64.us-west-2.compute.internal k3s[20388]: I1120 17:42:03.478712 20388 conntrack.go:103] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
Nov 20 17:42:03 ip-172-31-34-64.us-west-2.compute.internal k3s[20388]: I1120 17:42:03.478735 20388 conntrack.go:52] Setting nf_conntrack_max to 131072
Nov 20 17:42:03 ip-172-31-34-64.us-west-2.compute.internal k3s[20388]: I1120 17:42:03.478778 20388 conntrack.go:103] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
Nov 20 17:42:03 ip-172-31-34-64.us-west-2.compute.internal k3s[20388]: I1120 17:42:03.478804 20388 conntrack.go:103] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
Nov 20 17:42:03 ip-172-31-34-64.us-west-2.compute.internal k3s[20388]: I1120 17:42:03.479051 20388 config.go:315] Starting service config controller
Nov 20 17:42:03 ip-172-31-34-64.us-west-2.compute.internal k3s[20388]: I1120 17:42:03.479059 20388 shared_informer.go:223] Waiting for caches to sync for service config
Nov 20 17:42:03 ip-172-31-34-64.us-west-2.compute.internal k3s[20388]: I1120 17:42:03.479611 20388 config.go:133] Starting endpoints config controller
Nov 20 17:42:03 ip-172-31-34-64.us-west-2.compute.internal k3s[20388]: I1120 17:42:03.479624 20388 shared_informer.go:223] Waiting for caches to sync for endpoints config
Nov 20 17:42:03 ip-172-31-34-64.us-west-2.compute.internal k3s[20388]: I1120 17:42:03.579288 20388 shared_informer.go:230] Caches are synced for service config
Nov 20 17:42:03 ip-172-31-34-64.us-west-2.compute.internal k3s[20388]: I1120 17:42:03.579826 20388 shared_informer.go:230] Caches are synced for endpoints config
Any luck with TimeoutStartSec
@Oats87 ?
@erikwilson unfortunately not, I tried a bunch of different things like infinity
(which doesn't work on RHEL/CentOS 7 systemd
) and stupidly high values
Changing the systemd unit file KillMode
to none
shows that systemd
is thinking that the process has failed for some reason, but the service is actually not dead.
Nov 20 11:53:27 ck-centos7-2 k3s[10599]: I1120 11:53:27.402436 10599 policy_none.go:43] [cpumanager] none policy: Start
Nov 20 11:53:27 ck-centos7-2 k3s[10599]: W1120 11:53:27.405480 10599 manager.go:596] Failed to retrieve checkpoint for "kubelet_internal_checkpoint": checkpoint is not found
Nov 20 11:53:27 ck-centos7-2 k3s[10599]: I1120 11:53:27.406903 10599 plugin_manager.go:114] Starting Kubelet Plugin Manager
Nov 20 11:53:27 ck-centos7-2 k3s[10599]: E1120 11:53:27.407562 10599 eviction_manager.go:260] eviction manager: failed to get summary stats: failed to get node info: node "ck-centos7-2" not found
Nov 20 11:53:27 ck-centos7-2 systemd[1]: Got cgroup empty notification for: /system.slice/k3s.service
Nov 20 11:53:27 ck-centos7-2 systemd[1]: k3s.service: cgroup is empty
Nov 20 11:53:27 ck-centos7-2 systemd[1]: Unwatching 10599.
Nov 20 11:53:27 ck-centos7-2 systemd[1]: k3s.service changed start -> failed
Nov 20 11:53:27 ck-centos7-2 systemd[1]: Job k3s.service/start finished, result=failed
Nov 20 11:53:27 ck-centos7-2 systemd[1]: Failed to start Lightweight Kubernetes.
-- Subject: Unit k3s.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit k3s.service has failed.
--
-- The result is failed.
Nov 20 11:53:27 ck-centos7-2 systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=JobRemoved cookie=18 reply_cookie=0 error=n/a
Nov 20 11:53:27 ck-centos7-2 systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=JobRemoved cookie=10070 reply_cookie=0 error=n/a
Nov 20 11:53:27 ck-centos7-2 systemd[1]: Unit k3s.service entered failed state.
Nov 20 11:53:27 ck-centos7-2 systemd[1]: k3s.service failed.
Nov 20 11:53:27 ck-centos7-2 systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1/agent interface=org.freedesktop.systemd1.Agent member=Released cookie=10071 reply_cookie=0 error=n/a
Nov 20 11:53:27 ck-centos7-2 systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1/unit/k3s_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=19 reply_cookie=0 error=n/a
Nov 20 11:53:27 ck-centos7-2 systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1/unit/k3s_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=20 reply_cookie=0 error=n/a
Nov 20 11:53:27 ck-centos7-2 systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1/unit/k3s_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=10072 reply_cookie=0 error=n/a
Nov 20 11:53:27 ck-centos7-2 systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1/unit/k3s_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=10073 reply_cookie=0 error=n/a
Nov 20 11:53:27 ck-centos7-2 systemd[1]: Got unexpected auxiliary data with level=1 and type=2
Nov 20 11:53:27 ck-centos7-2 systemd[1]: Got unexpected auxiliary data with level=1 and type=2
Nov 20 11:53:27 ck-centos7-2 polkitd[795]: Unregistered Authentication Agent for unix-process:10588:254223 (system bus name :1.43, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (disconnected from bus)
Nov 20 11:53:27 ck-centos7-2 systemd[1]: Got message type=method_call sender=n/a destination=org.freedesktop.systemd1 object=/org/freedesktop/systemd1/unit/k3s_2eservice interface=org.freedesktop.DBus.Properties member=Get cookie=4 reply_cookie=0 error=n/a
Nov 20 11:53:27 ck-centos7-2 systemd[1]: SELinux access check scon=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tcon=unconfined_u:object_r:systemd_unit_file_t:s0 tclass=service perm=status path=/etc/systemd/system/k3s.service cmdline=systemctl start k3s: 0
Nov 20 11:53:27 ck-centos7-2 systemd[1]: Sent message type=method_return sender=n/a destination=n/a object=n/a interface=n/a member=n/a cookie=21 reply_cookie=4 error=n/a
Nov 20 11:53:27 ck-centos7-2 systemd[1]: Got disconnect on private connection.
Nov 20 11:53:27 ck-centos7-2 systemd[1]: Got message type=signal sender=org.freedesktop.DBus destination=n/a object=/org/freedesktop/DBus interface=org.freedesktop.DBus member=NameOwnerChanged cookie=119 reply_cookie=0 error=n/a
Nov 20 11:53:27 ck-centos7-2 k3s[10599]: E1120 11:53:27.461544 10599 kubelet.go:2183] node "ck-centos7-2" not found
Nov 20 11:53:27 ck-centos7-2 k3s[10599]: E1120 11:53:27.562659 10599 kubelet.go:2183] node "ck-centos7-2" not found
Nov 20 11:53:27 ck-centos7-2 k3s[10599]: E1120 11:53:27.627788 10599 node.go:125] Failed to retrieve node info: nodes "ck-centos7-2" is forbidden: User "system:kube-proxy" cannot get resource "nodes" in API group "" at the cluster scope
Nov 20 11:53:27 ck-centos7-2 k3s[10599]: E1120 11:53:27.662826 10599 kubelet.go:2183] node "ck-centos7-2" not found
Nov 20 11:53:27 ck-centos7-2 k3s[10599]: E1120 11:53:27.763034 10599 kubelet.go:2183] node "ck-centos7-2" not found
Nov 20 11:53:27 ck-centos7-2 k3s[10599]: I1120 11:53:27.813637 10599 reconciler.go:157] Reconciler: start to sync state
Nov 20 11:53:27 ck-centos7-2 k3s[10599]: E1120 11:53:27.863496 10599 kubelet.go:2183] node "ck-centos7-2" not found
OK. As far as I can deduce, systemd
appears to have an issue where it is not respecting TimeoutStartSec
values for Type=notify
services. I made a one line change on a custom build of K3s
[root@cannon01 k3s]# git diff pkg/cli/server/server.go
diff --git a/pkg/cli/server/server.go b/pkg/cli/server/server.go
index 2d80ca222f..e7aea5d9a7 100644
--- a/pkg/cli/server/server.go
+++ b/pkg/cli/server/server.go
@@ -39,6 +39,7 @@ func Run(app *cli.Context) error {
}
func run(app *cli.Context, cfg *cmds.Server) error {
+ systemd.SdNotify(true, "READY=1\n")
var (
err error
)
[root@cannon01 k3s]#
and now K3s
is starts reliably on multiple CentOS 7.8/7.9 test systems with stock settings. I can only assume that the reason for us hitting this bug now is due to changes to K8s 1.19 affecting start up time.
It appears that this is related to a commit backported by RHEL: https://github.com/systemd-rhel/rhel-7/commit/273d69011bf2a8abfcef71e33e6b6ae3323dfc34
This change appears to make it so that if any child processes exit (generating SIGCHLD) before we send our startup notification to systemd, systemd will mark the unit as failed and send termination signals to all the remaining child processes. There is also something else going on relating to the service_notify_cgroup_empty_event
that's triggered by the sigchld handler, but I haven't bothered to track that down.
I think the --docker
flag comes into play because we start forking things sooner when we don't have to wait for containerd to start up. Since we do a fair bit of forking out to subshells during various parts of our startup, we will either need to move the startup notification earlier in startup (which reduces its usefulness) or just switch to type=exec
.
Summary of findings:
We believe that the change https://github.com/systemd-rhel/rhel-7/commit/273d69011bf2a8abfcef71e33e6b6ae3323dfc34#diff-88a01402a6feb6c8a13f32a718325c7fdbdd4ca47de52b147ff75eaace8e1903R2568 here is causing us problems, due to the way we handle the way we live under the various cgroups.
When we initially start, the k3s
process lives under /system.slice/k3s.service
. We eventually will end up living under /systemd/system.slice
(i.e. we move). In the case of --docker
, this leaves the k3s.service
slice empty, hence triggering the cgroup is empty
messages. When we start our own containerd
, the various containers continue to live under the k3s.service
cgroup, hence systemd
sees that it is not empty and doesn't nix the main process. Note that we only move when we start the agent -- this is why --disable-agent
allows a successful startup.
This seems to only matter when we are calling SdNotify
, i.e. if we call SdNotify
early on enough that we haven't yet moved, systemd
will be happy and transition us to the Running
state. If we are too late and have already moved, systemd
says cgroup is empty
and kills our process.
https://github.com/rancher/k3s/blob/master/pkg/daemons/agent/agent.go#L197 is of note
Instead of changing the way we are operating under cgroups (i.e. changing existing behavior), I will open a PR to change Type=notify
to Type=exec
and we can visit using systemd notifications again later on.
To be clear - the cgroup move has something to do with the kubelet, which is why running with --disable-agent
also works - it prevents the server starting the kubelet and in turn vacating the cgroup. We only ever used type=notify
for server, probably because we ran into similar problems on agents.
Possibly related: https://github.com/rancher/k3s/issues/2502
More context: systemd for EL7 appears to be missing the commit/PR that came out of https://github.com/systemd-rhel/rhel-8/commit/3c751b1bfaf734db09256a5631f1f9ce75cf0d35 (which makes sense because it was introduced past systemd 233
EL8 is running systemd 239
which includes this: https://github.com/systemd-rhel/rhel-8/commit/3c751b1bfaf734db09256a5631f1f9ce75cf0d35
Filed an upstream issue with Red Hat on this here: https://bugzilla.redhat.com/show_bug.cgi?id=1900877
I believe that https://github.com/systemd-rhel/rhel-8/commit/3c751b1bfaf734db09256a5631f1f9ce75cf0d35 just coincidentally mitigated an otherwise unsupported behavior of k3s with regards to how it manipulates the cgroup hierarchy for a service managed by systemd.
k3s basically escapes the systemd-managed cgroup by moving the controll process from /system.slice/k3s.service
to /systemd/system.slice/k3s.service
. It's not too surprising that this would confuse systemd and cause weird behaviour such as nixing the control PID or not terminating all processes on shutdown.
This seems to be in conflict with upstream requirements:
If [services] create and manipulate cgroups outside of units that have Delegate=yes set, they violate the access contract for control groups.
As @Oats87 noted above, the only reason probably that this didn't affect k3s installations using containerd was that when running with containerd the unit's cgroup would not be completely empty as only the k3s process is moved.
The fact that we need to set KillMode=process
in the k3s.service
configuration also actually seems to work around the problem that the default kill mode (control-group
) doesn't work because we are removing the main process from the service's control group.
I wonder why we are setting up kubelet
to move our PID to a different cgroup in the first place (by passing the parameter kubelet-cgroups=/systemd/system.slice
). I have not seen this being done in any other distro and it seems wrong and causes inconsistency between the cgroup hierarchy of k3s running with and without agent.
Related issues:
https://github.com/rancher/k3s/issues/2587 https://github.com/rancher/k3s/issues/2583 https://github.com/rancher/k3s/issues/2502
Thanks @janeczku
Looking through the prior changes related to this, I don't think we should be setting kubelet-cgroups
and runtime-cgroups
the way we are.
https://github.com/rancher/k3s/pull/133/commits/602f0d70b446abe3161cbe603dbda0089a8f877e
Setting the according --kubelet-arg runtime-cgroups=/system.slice/docker.service --kubelet-arg kubelet-cgroups=/system.slice/k3s.service
allows K3s
to start no problem on CentOS 7.8 and above.
Workaround:
1. curl https://releases.rancher.com/install-docker/19.03.sh | sh
2. systemctl enable docker (and disable firewalld if that's enabled)
3. sudo setenforce 0
4. sudo curl -sfL https://get.k3s.io | INSTALL_K3S_SKIP_START=true INSTALL_K3S_EXEC="server --disable traefik --docker --kubelet-arg runtime-cgroups=/system.slice/docker.service --kubelet-arg kubelet-cgroups=/system.slice/k3s.service" sh
5. Add "Wants=docker.service", "After=docker.service" to /etc/systemd/system/k3s.service
6. sudo systemctl daemon-reload
7. sudo systemctl start k3s
I agree that it seems weird. It might be worth asking @ibuildthecloud what he was doing there and if it's still needed.
Update:
We are configuring the kubelet with --cgroup-driver=cgroupfs
. Our logic is looking to see that we're operating under the systemd
hierarchy and is then setting the according --kubelet-cgroups
and --runtime-cgroups
flags for the Kubelet, with the assumption that we should be prepending /systemd
to the front of the full path to the cgroup. The issue is that when we prepend /systemd
, we end up with a redundant path i.e. /sys/fs/cgroup/systemd/systemd
On an Ubuntu 18.04 system, K3s
ends up living under
ubuntu@ip-172-31-32-112:/sys/fs/cgroup/systemd/systemd/system.slice$ cat cgroup.procs
1265
ubuntu@ip-172-31-32-112:/sys/fs/cgroup/systemd/systemd/system.slice$ ps 1265
PID TTY STAT TIME COMMAND
1265 ? Ssl 0:11 /usr/local/bin/k3s server
ubuntu@ip-172-31-32-112:/sys/fs/cgroup/systemd/systemd/system.slice$
On an EL7 system, K3s
ends up in a similar case living under the redundant systemd
path, but systemd
does not "merge" the duplicate systemd
into the correct hierarchy like the systemd
on Ubuntu does.
[root@ck-centos7-0 system.slice]# pwd
/sys/fs/cgroup/systemd/systemd/system.slice
[root@ck-centos7-0 system.slice]# cat cgroup.procs
15776
[root@ck-centos7-0 system.slice]# ps 15776
PID TTY STAT TIME COMMAND
15776 ? Ssl 0:24 /usr/local/bin/k3s server
[root@ck-centos7-0 system.slice]#
Note that I ran both of these tests with a stock K3s configuration which is why you don't see the docker process.
I'm looking into how/why systemd
on ubuntu is treating the redundant systemd
differently.
Talking to Darren, it turns out that the reason why we are trying to prepend /systemd
(and/or even set --kubelet-cgroups
and --runtime-cgroups
in the first place) is due to the case when we run in docker, i.e. with k3d
. This is because a docker container by default lives under the /docker
cgroup path, and when you are within the container, that /docker
is not visible to the kubelet and it's constantly trying to check /docker/<cgroup>
which doesn't exist
Validation on k3s v1.19.5-rc1+k3s1, Backporting to earlier k3s versions will be tracked separately in the issue #2687
Stopping k3s/docker and bringing them back on works as expected.
[centos@ip-172-31-25-35 ~]$ sudo curl -sfL https://get.k3s.io | INSTALL_K3S_SKIP_START=true INSTALL_K3S_VERSION=v1.19.5-rc1+k3s1 INSTALL_K3S_EXEC="server --disable traefik --docker" sh
[centos@ip-172-31-25-35 ~]$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-172-31-25-35.us-east-2.compute.internal Ready master 18s v1.19.5-rc1+k3s1
[centos@ip-172-31-25-35 ~]$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system local-path-provisioner-7ff9579c6-7mq7p 1/1 Running 0 5s
kube-system metrics-server-7b4f8b595-wxjkq 1/1 Running 0 5s
kube-system coredns-88dbd9b97-g5v9k 1/1 Running 0 5s
[centos@ip-172-31-25-35 ~]$ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
60b2db61466f 9dd718864ce6 "/metrics-server" 2 minutes ago Up 2 minutes k8s_metrics-server_metrics-server-7b4f8b595-wxjkq_kube-system_1de460de-f6c9-451e-9c97-e054f9cc3248_0
971eb3b5f3ae e422121c9c5f "local-path-provisio…" 2 minutes ago Up 2 minutes k8s_local-path-provisioner_local-path-provisioner-7ff9579c6-7mq7p_kube-system_81f54d91-8f93-47bc-8bcf-e3cf7a150a30_0
22bd39a166ec 0a6cfbf7b0b6 "/coredns -conf /etc…" 2 minutes ago Up 2 minutes k8s_coredns_coredns-88dbd9b97-g5v9k_kube-system_99848c93-906c-4444-ab76-0d09813b4f51_0
9a7a7fb1b5fe rancher/pause:3.1 "/pause" 2 minutes ago Up 2 minutes k8s_POD_local-path-provisioner-7ff9579c6-7mq7p_kube-system_81f54d91-8f93-47bc-8bcf-e3cf7a150a30_1
847e17224abc rancher/pause:3.1 "/pause" 2 minutes ago Up 2 minutes k8s_POD_metrics-server-7b4f8b595-wxjkq_kube-system_1de460de-f6c9-451e-9c97-e054f9cc3248_1
cfde06f1b53d rancher/pause:3.1 "/pause" 2 minutes ago Up 2 minutes k8s_POD_coredns-88dbd9b97-g5v9k_kube-system_99848c93-906c-4444-ab76-0d09813b4f51_1
─system.slice
├─k3s.service
│ └─3497 /usr/local/bin/k3s server
├─docker.service
│ └─3349 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
[centos@ip-172-31-25-35 ~]$ cat /etc/redhat-release
CentOS Linux release 7.8.2003 (Core)
This problem seems still exist when i use the newest file download from https://get.k3s.io/
@discdisk can you open a new issue and provide the information requested by the template?
Environmental Info: K3s Version: v1.19.3+k3s3 (0e4fbfef)
Node(s) CPU architecture, OS, and Version: amd64, Centos 7.8 (Using the same steps, it works fine on centos7.6)
Linux iZ6weix7w7e0sy67ak2vt0Z 3.10.0-1127.19.1.el7.x86_64 rancher/k3s#1 SMP Tue Aug 25 17:23:54 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Cluster Configuration: 1 master
Describe the bug:
Cannot start K3s with docker container runtime on centos7.8
Steps To Reproduce:
Expected behavior:
Expect k3s to start successfully Actual behavior:
k3s failed to start
Additional context / logs: