Closed dsteinberg111 closed 1 year ago
Hey @dsteinberg111, what steps did you take to migrate your nodes from docker to containerd?
This is my Work-in-Progess Dokumentation what I did on all Nodes. I kept the docker repository because they have up to date container.d packages:
kubectl get nodes -o wide
export NODE={x}{n}
kubectl cordon ${NODE}
kubectl drain ${NODE} --ignore-daemonsets
kubectl edit node ${NODE}
kubectl edit node ${NODE}
apiVersion: v1
kind: Node
metadata:
annotations:
kubeadm.alpha.kubernetes.io/cri-socket: unix:///run/containerd/containerd.sock
The docker repo has fresh containerd package. Standard containerd ist way too old in Debian 11
sudo systemctl stop kubelet
sudo systemctl status kubelet
sudo apt purge docker-ce docker-ce-cli docker docker-engine docker.io containerd runc
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/debian/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/debian \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install containerd.io
sudo su -
mkdir -p /etc/containerd
containerd config default > /etc/containerd/config.toml
vi /etc/containerd/config.toml
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf
overlay
br_netfilter
kvm-intel
EOF
sudo modprobe overlay
sudo modprobe br_netfilter
cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv6.conf.default.forwarding = 1
EOF
cat << EOF | sudo tee /etc/systemd/system/kubelet.service.d/containerd.conf
[Service]
Environment="KUBELET_EXTRA_ARGS=--runtime-request-timeout=15m --image-service-endpoint=unix:///run/containerd/containerd.sock --cgroup-driver=systemd"
EOF
cat << EOF | sudo tee /var/lib/kubelet/kubeadm-flags.env
KUBELET_KUBEADM_ARGS="--cgroup-driver=systemd --pod-infra-container-image=k8s.gcr.io/pause:3.1 --container-runtime=remote --container-runtime-endpoint=/run/containerd/containerd.sock"
EOF
rm /etc/sysctl.d/99-kubernetes-net.conf
sysctl --system
echo "runtime-endpoint: unix:///run/containerd/containerd.sock" > /etc/crictl.yaml
rm /etc/systemd/system/kubelet.service.d/20-hetzner-cloud.conf
sudo cat << EOF | sudo tee /etc/cni/net.d/10-containerd-net.conflist
{
"cniVersion": "1.0.0",
"name": "containerd-net",
"plugins": [
{
"type": "bridge",
"bridge": "cni0",
"isGateway": true,
"ipMasq": true,
"promiscMode": true,
"ipam": {
"type": "host-local",
"ranges": [
[{
"subnet": "10.88.0.0/16"
}],
[{
"subnet": "2001:4860:4860::/64"
}]
],
"routes": [
{ "dst": "0.0.0.0/0" },
{ "dst": "::/0" }
]
}
},
{
"type": "portmap",
"capabilities": {"portMappings": true}
}
]
}
EOF
systemctl daemon-reload
systemctl enable --now containerd
systemctl restart containerd
crictl ps
systemctl start kublet
apt install -y ca-certificates curl gnupg lsb-release ntp apparmor apparmor-utils
kubectl uncordon ${NODE}
Could you post the output from these two commands? I currently suspect that the contents of /etc/systemd/system/kubelet.service.d/containerd.conf
overwrite us setting the --cloud-provider=external
flag (see step 1 in the readme). The output from these two commands will confirm my suspicion.
systemctl cat kubelet.service
systemctl show kubelet.service
I removed the cloud-provider flag because 1.24 wont start with that flag. I totally forgot about it.... But what is the "new" way ?
cat /etc/systemd/system/kubelet.service.d/containerd.conf
[Service]
Environment="KUBELET_EXTRA_ARGS=--runtime-request-timeout=15m --image-service-endpoint=unix:///run/containerd/containerd.sock --cgroup-driver=systemd"
# /lib/systemd/system/kubelet.service
[Unit]
Description=kubelet: The Kubernetes Node Agent
Documentation=https://kubernetes.io/docs/home/
Wants=network-online.target
After=network-online.target
[Service]
ExecStart=/usr/bin/kubelet
Restart=always
StartLimitInterval=0
RestartSec=10
[Install]
WantedBy=multi-user.target
# /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
# Note: This dropin only works with kubeadm and kubelet v1.11+
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/default/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS
# /etc/systemd/system/kubelet.service.d/containerd.conf
[Service]
Environment="KUBELET_EXTRA_ARGS=--runtime-request-timeout=15m --image-service-endpoint=unix:///run/containerd/containerd.sock --cgroup-driver=systemd"
Type=simple
Restart=always
NotifyAccess=none
RestartUSec=10s
TimeoutStartUSec=1min 30s
TimeoutStopUSec=1min 30s
TimeoutAbortUSec=1min 30s
TimeoutStartFailureMode=terminate
TimeoutStopFailureMode=terminate
RuntimeMaxUSec=infinity
WatchdogUSec=0
WatchdogTimestampMonotonic=0
RootDirectoryStartOnly=no
RemainAfterExit=no
GuessMainPID=yes
MainPID=748
ControlPID=0
FileDescriptorStoreMax=0
NFileDescriptorStore=0
StatusErrno=0
Result=success
ReloadResult=success
CleanResult=success
UID=[not set]
GID=[not set]
NRestarts=1
OOMPolicy=stop
ExecMainStartTimestamp=Thu 2023-02-09 11:30:04 CET
ExecMainStartTimestampMonotonic=18267569
ExecMainExitTimestampMonotonic=0
ExecMainPID=748
ExecMainCode=0
ExecMainStatus=0
ExecStart={ path=/usr/bin/kubelet ; argv[]=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS ; ignore_errors=no ; start_time=[Thu 2023-02-09 11:30:04 CET] ; stop_time=[n/a] ; pid=748 ; code=(null) ; status>
ExecStartEx={ path=/usr/bin/kubelet ; argv[]=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS ; flags= ; start_time=[Thu 2023-02-09 11:30:04 CET] ; stop_time=[n/a] ; pid=748 ; code=(null) ; status=0/0 }
Slice=system.slice
ControlGroup=/system.slice/kubelet.service
MemoryCurrent=73760768
CPUUsageNSec=37726662930000
EffectiveCPUs=0-1
EffectiveMemoryNodes=0
TasksCurrent=16
IPIngressBytes=[no data]
IPIngressPackets=[no data]
IPEgressBytes=[no data]
IPEgressPackets=[no data]
IOReadBytes=18446744073709551615
IOReadOperations=18446744073709551615
IOWriteBytes=18446744073709551615
IOWriteOperations=18446744073709551615
Delegate=no
CPUAccounting=yes
CPUWeight=[not set]
StartupCPUWeight=[not set]
CPUShares=[not set]
StartupCPUShares=[not set]
CPUQuotaPerSecUSec=infinity
CPUQuotaPeriodUSec=infinity
AllowedCPUs=
AllowedMemoryNodes=
IOAccounting=no
IOWeight=[not set]
StartupIOWeight=[not set]
BlockIOAccounting=no
BlockIOWeight=[not set]
StartupBlockIOWeight=[not set]
MemoryAccounting=yes
DefaultMemoryLow=0
DefaultMemoryMin=0
MemoryMin=0
MemoryLow=0
MemoryHigh=infinity
MemoryMax=infinity
MemorySwapMax=infinity
MemoryLimit=infinity
DevicePolicy=auto
TasksAccounting=yes
TasksMax=4559
IPAccounting=no
ManagedOOMSwap=auto
ManagedOOMMemoryPressure=auto
ManagedOOMMemoryPressureLimitPercent=0%
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf" KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml "KUBELET_EXTRA_ARGS=--runtime-request-timeout=15m --image-se>
EnvironmentFiles=/var/lib/kubelet/kubeadm-flags.env (ignore_errors=yes)
EnvironmentFiles=/etc/default/kubelet (ignore_errors=yes)
UMask=0022
LimitCPU=infinity
LimitCPUSoft=infinity
LimitFSIZE=infinity
LimitFSIZESoft=infinity
LimitDATA=infinity
LimitDATASoft=infinity
LimitSTACK=infinity
LimitSTACKSoft=8388608
LimitCORE=infinity
LimitCORESoft=0
LimitRSS=infinity
LimitRSSSoft=infinity
LimitNOFILE=524288
LimitNOFILESoft=1024
LimitAS=infinity
LimitASSoft=infinity
LimitNPROC=15199
LimitNPROCSoft=15199
LimitMEMLOCK=65536
LimitMEMLOCKSoft=65536
LimitLOCKS=infinity
LimitLOCKSSoft=infinity
LimitSIGPENDING=15199
LimitSIGPENDINGSoft=15199
LimitMSGQUEUE=819200
LimitMSGQUEUESoft=819200
LimitNICE=0
LimitNICESoft=0
LimitRTPRIO=0
LimitRTPRIOSoft=0
LimitRTTIME=infinity
LimitRTTIMESoft=infinity
RootHashSignature=
OOMScoreAdjust=0
CoredumpFilter=0x33
Nice=0
IOSchedulingClass=0
IOSchedulingPriority=0
CPUSchedulingPolicy=0
CPUSchedulingPriority=0
CPUAffinity=
CPUAffinityFromNUMA=no
NUMAPolicy=n/a
NUMAMask=
TimerSlackNSec=50000
CPUSchedulingResetOnFork=no
NonBlocking=no
StandardInput=null
StandardInputData=
StandardOutput=journal
StandardError=inherit
TTYReset=no
TTYVHangup=no
TTYVTDisallocate=no
SyslogPriority=30
SyslogLevelPrefix=yes
SyslogLevel=6
SyslogFacility=3
LogLevelMax=-1
LogRateLimitIntervalUSec=0
LogRateLimitBurst=0
SecureBits=0
CapabilityBoundingSet=cap_chown cap_dac_override cap_dac_read_search cap_fowner cap_fsetid cap_kill cap_setgid cap_setuid cap_setpcap cap_linux_immutable cap_net_bind_service cap_net_broadcast cap_net_admin cap_net_raw cap_ipc_lock cap_ipc_owner cap_sys_module>
AmbientCapabilities=
DynamicUser=no
RemoveIPC=no
MountFlags=
PrivateTmp=no
PrivateDevices=no
ProtectClock=no
ProtectKernelTunables=no
ProtectKernelModules=no
ProtectKernelLogs=no
ProtectControlGroups=no
PrivateNetwork=no
PrivateUsers=no
PrivateMounts=no
ProtectHome=no
ProtectSystem=no
SameProcessGroup=no
UtmpMode=init
IgnoreSIGPIPE=yes
NoNewPrivileges=no
SystemCallErrorNumber=2147483646
LockPersonality=no
RuntimeDirectoryPreserve=no
RuntimeDirectoryMode=0755
StateDirectoryMode=0755
CacheDirectoryMode=0755
LogsDirectoryMode=0755
ConfigurationDirectoryMode=0755
TimeoutCleanUSec=infinity
MemoryDenyWriteExecute=no
RestrictRealtime=no
RestrictSUIDSGID=no
RestrictNamespaces=no
MountAPIVFS=no
KeyringMode=private
ProtectProc=default
ProcSubset=all
ProtectHostname=no
KillMode=control-group
KillSignal=15
RestartKillSignal=15
FinalKillSignal=9
SendSIGKILL=yes
SendSIGHUP=no
WatchdogSignal=6
Id=kubelet.service
Names=kubelet.service
Requires=sysinit.target system.slice
Wants=network-online.target
WantedBy=multi-user.target
Conflicts=shutdown.target
Before=multi-user.target shutdown.target
After=system.slice basic.target sysinit.target network-online.target systemd-journald.socket
Documentation=https://kubernetes.io/docs/home/
Description=kubelet: The Kubernetes Node Agent
LoadState=loaded
ActiveState=active
FreezerState=running
SubState=running
FragmentPath=/lib/systemd/system/kubelet.service
DropInPaths=/etc/systemd/system/kubelet.service.d/10-kubeadm.conf /etc/systemd/system/kubelet.service.d/containerd.conf
UnitFileState=enabled
UnitFilePreset=enabled
StateChangeTimestamp=Thu 2023-02-09 11:30:04 CET
StateChangeTimestampMonotonic=18268430
InactiveExitTimestamp=Thu 2023-02-09 11:30:04 CET
InactiveExitTimestampMonotonic=18268430
ActiveEnterTimestamp=Thu 2023-02-09 11:30:04 CET
ActiveEnterTimestampMonotonic=18268430
ActiveExitTimestamp=Thu 2023-02-09 11:29:55 CET
ActiveExitTimestampMonotonic=8068477
InactiveEnterTimestamp=Thu 2023-02-09 11:30:04 CET
InactiveEnterTimestampMonotonic=18250995
CanStart=yes
CanStop=yes
CanReload=no
CanIsolate=no
CanFreeze=yes
StopWhenUnneeded=no
RefuseManualStart=no
RefuseManualStop=no
AllowIsolate=no
DefaultDependencies=yes
OnFailureJobMode=replace
IgnoreOnIsolate=no
NeedDaemonReload=no
JobTimeoutUSec=infinity
JobRunningTimeoutUSec=infinity
JobTimeoutAction=none
ConditionResult=yes
AssertResult=yes
ConditionTimestamp=Thu 2023-02-09 11:30:04 CET
ConditionTimestampMonotonic=18251194
AssertTimestamp=Thu 2023-02-09 11:30:04 CET
AssertTimestampMonotonic=18251198
Transient=no
Perpetual=no
StartLimitIntervalUSec=0
StartLimitBurst=5
StartLimitAction=none
FailureAction=none
SuccessAction=none
InvocationID=37ac66ba77d44e09a017939a31e51c3f
CollectMode=inactive
I removed the cloud-provider flag because 1.24 wont start with that flag. I totally forgot about it.... But what is the "new" way ?
AFAIK there is no new way, you still need to set the --cloud-provider=external
flag even in 1.26. Please try to add it to your KUBELET_EXTRA_ARGS
.
Thx - that did the job. I had the impression from the Doc that its obsolete in 1.24. I updated my Go through:
kubectl get nodes -o wide
export NODE={x}{n}
kubectl cordon ${NODE}
kubectl drain ${NODE} --ignore-daemonsets
kubectl edit node ${NODE}
metadata:
annotations:
kubeadm.alpha.kubernetes.io/cri-socket: unix:///run/containerd/containerd.sock
or
cat << EOF >> /tmp/${NODE}-patch.yaml
metadata:
annotations:
kubeadm.alpha.kubernetes.io/cri-socket: unix:///run/containerd/containerd.sock
EOF
kubectl patch node ${NODE} --patch-file /tmp/${NODE}-patch.yaml
sudo systemctl stop kubelet
sudo systemctl status kubelet
sudo apt purge docker-ce docker-ce-cli docker docker-engine docker.io containerd runc
# Mainly from https://docs.docker.com/engine/install/debian/ but without docker
# Debian 11 has only containerd 1.4 ( https://containerd.io/releases/#kubernetes-support ).
# And this was the only well maintained Repo that I found. Feel free to do better
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/debian/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/debian \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install containerd.io ca-certificates curl gnupg lsb-release ntp apparmor apparmor-utils
sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml
# Set SystemdCgroup to true
sed -i -E 's/(.*SystemdCgroup =) false/\1 true/g' /etc/containerd/config.toml
# Ensure Kernel Modules for Containerd
cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf
overlay
br_netfilter
kvm-intel
EOF
sudo modprobe overlay
sudo modprobe br_netfilter
cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv6.conf.default.forwarding = 1
EOF
sudo sed -i -E 's/(runtimeRequestTimeout: ).*/\115m/g' /var/lib/kubelet/config.yaml
cat << EOF | sudo tee /etc/systemd/system/kubelet.service.d/20-hetzner-cloud.conf
[Service]
Environment="KUBELET_EXTRA_ARGS=--cloud-provider=external --image-service-endpoint=unix:///run/containerd/containerd.sock"
EOF
cat << EOF | sudo tee /var/lib/kubelet/kubeadm-flags.env
KUBELET_KUBEADM_ARGS="--cgroup-driver=systemd --pod-infra-container-image=k8s.gcr.io/pause:3.1 --container-runtime=remote --container-runtime-endpoint=/run/containerd/containerd.sock"
EOF
rm /etc/sysctl.d/99-kubernetes-net.conf
sysctl --system
echo "runtime-endpoint: unix:///run/containerd/containerd.sock" > /etc/crictl.yaml
# https://kubernetes.io/docs/tasks/administer-cluster/migrating-from-dockershim/troubleshooting-cni-plugin-related-errors/
sudo cat << EOF | sudo tee /etc/cni/net.d/10-containerd-net.conflist
{
"cniVersion": "1.0.0",
"name": "containerd-net",
"plugins": [
{
"type": "bridge",
"bridge": "cni0",
"isGateway": true,
"ipMasq": true,
"promiscMode": true,
"ipam": {
"type": "host-local",
"ranges": [
[{
"subnet": "10.88.0.0/16"
}],
[{
"subnet": "2001:4860:4860::/64"
}]
],
"routes": [
{ "dst": "0.0.0.0/0" },
{ "dst": "::/0" }
]
}
},
{
"type": "portmap",
"capabilities": {"portMappings": true}
}
]
}
EOF
sudo systemctl daemon-reload
sudo systemctl enable --now containerd
sudo systemctl restart containerd
# Test if Containerd works (should give an empty list of containers - only headers)
sudo crictl ps
sudo systemctl daemon-reload
sudo systemctl restart kubelet.service
kubectl uncordon ${NODE}
Thx - that did the job. I had the impression from the Doc that its obsolete in 1.24.
I just realized that we actually do state that in our README. I updated the README to match reality: #347
As some of you know - The upgrade from Kubernetes 1.23 to 1.24 needs some preparation. You have to change the Container-Runtime at least. And the default change is to Containerd. So I changed all master and workers to containerd while still on 1.23. This works OK but since that moment the external IP is changed with the Internal IP. And the Internal IP is gone. And upgrading to 1.24 doesn´t change this. Labels seem to work.