kubernetes / kubeadm

Aggregator for issues filed against kubeadm
Apache License 2.0
3.76k stars 716 forks source link

kubelet defaults to 'unix:///var/run/docker.sock' even if cri-socket/container-runtime-endpoint is specified on 'kubeadm init' #1328

Closed fhemberger closed 5 years ago

fhemberger commented 5 years ago

Is this a BUG REPORT or FEATURE REQUEST?

Choose one: BUG REPORT

Versions

kubeadm version (use kubeadm version): kubeadm version: &version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.1", GitCommit:"eec55b9ba98609a46fee712359c7b5b365bdd920", GitTreeState:"clean", BuildDate:"2018-12-13T10:36:44Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}

Environment:

What happened?

kubelet tries to connect to unix:///var/run/docker.sock.

Dec 19 18:43:12 k8s kubelet[10174]: I1219 18:43:12.014733   10174 server.go:407] Version: v1.13.1
Dec 19 18:43:12 k8s kubelet[10174]: I1219 18:43:12.015999   10174 plugins.go:103] No cloud provider specified.
Dec 19 18:43:12 k8s kubelet[10174]: W1219 18:43:12.016105   10174 server.go:552] standalone mode, no API client
Dec 19 18:43:12 k8s kubelet[10174]: W1219 18:43:12.035998   10174 server.go:464] No api server defined - no events will be sent to API server.
Dec 19 18:43:12 k8s kubelet[10174]: I1219 18:43:12.036079   10174 server.go:666] --cgroups-per-qos enabled, but --cgroup-root was not specified.  defaulting to /
Dec 19 18:43:12 k8s kubelet[10174]: I1219 18:43:12.037241   10174 container_manager_linux.go:248] container manager verified user specified cgroup-root exists: []
Dec 19 18:43:12 k8s kubelet[10174]: I1219 18:43:12.037317   10174 container_manager_linux.go:253] Creating Container Manager object based on Node Config: {RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: ContainerRuntime:docker CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:cgroupfs KubeletRootDir:/var/lib/kubelet ProtectKernelDefaults:false NodeAllocata
Dec 19 18:43:12 k8s kubelet[10174]: I1219 18:43:12.037935   10174 container_manager_linux.go:272] Creating device plugin manager: true
Dec 19 18:43:12 k8s kubelet[10174]: I1219 18:43:12.038043   10174 state_mem.go:36] [cpumanager] initializing new in-memory state store
Dec 19 18:43:12 k8s kubelet[10174]: I1219 18:43:12.038618   10174 state_mem.go:84] [cpumanager] updated default cpuset: ""
Dec 19 18:43:12 k8s kubelet[10174]: I1219 18:43:12.038681   10174 state_mem.go:92] [cpumanager] updated cpuset assignments: "map[]"
Dec 19 18:43:12 k8s kubelet[10174]: I1219 18:43:12.045855   10174 client.go:75] Connecting to docker on unix:///var/run/docker.sock
Dec 19 18:43:12 k8s kubelet[10174]: I1219 18:43:12.045954   10174 client.go:104] Start docker client with request timeout=2m0s
Dec 19 18:43:12 k8s kubelet[10174]: E1219 18:43:12.046424   10174 kube_docker_client.go:91] failed to retrieve docker version: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Dec 19 18:43:12 k8s kubelet[10174]: W1219 18:43:12.046605   10174 kube_docker_client.go:92] Using empty version for docker client, this may sometimes cause compatibility issue.
Dec 19 18:43:12 k8s kubelet[10174]: F1219 18:43:12.047190   10174 server.go:261] failed to run Kubelet: failed to create kubelet: failed to get docker version: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

What you expected to happen?

kubelet should connect to unix:///var/run/crio/crio.sock.

How to reproduce it (as minimally and precisely as possible)?

  1. Install cri-o as described here: https://kubernetes.io/docs/setup/cri/

  2. Create config files and run kubeadm init:

# See documentation:
# https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init/#use-kubeadm-with-cri-runtimes
mkdir -p /etc/systemd/system/kubelet.service.d/
cat > /etc/systemd/system/kubelet.service.d/20-cri.conf <<EOF
[Unit]
Wants=crio.service

[Service]
Environment="KUBELET_EXTRA_ARGS=--container-runtime=remote --cgroup-driver=systemd --container-runtime-endpoint=unix:///var/run/crio/crio.sock --image-service-endpoint=unix:///var/run/crio/crio.sock --runtime-request-timeout=10m"
EOF

# Also tried setting args in `/etc/default/kubelet`, see:
# https://kubernetes.io/docs/setup/independent/kubelet-integration/#the-kubelet-drop-in-file-for-systemd
mkdir -p /etc/default
cat > /etc/default/kubelet <<EOF
KUBELET_EXTRA_ARGS=--container-runtime=remote --cgroup-driver=systemd --container-runtime-endpoint=unix:///var/run/crio/crio.sock --image-service-endpoint=unix:///var/run/crio/crio.sock --runtime-request-timeout=10m
EOF

systemctl daemon-reload
systemctl enable kubelet.service
systemctl start kubelet.service

kubeadm init --pod-network-cidr=192.168.0.0/16 --cri-socket=unix:///var/run/crio/crio.sock

Anything else we need to know?

Might be related to #1322?

neolit123 commented 5 years ago

thanks for the report instead, have you tried passing something like the following as a --config file to init?

apiVersion: kubeadm.k8s.io/v1beta1
kind: InitConfiguration
nodeRegistration:
  criSocket: "unix:///var/run/crio/crio.sock"
---
apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterConfiguration
kubernetesVersion: "v1.13.1"
networking:
  podSubnet: "192.168.0.0/16"

/priority critical-urgent /kind bug /priority awaiting-more-evidence

/assign @rosti @bart0sh PTAL, there seem to be some cri-socket related bugs lurking around as reported here and in #1322

thanks,.

fhemberger commented 5 years ago

/var/lib/kubelet/kubeadm-flags.env looks okay:

KUBELET_KUBEADM_ARGS=--container-runtime=remote --container-runtime-endpoint=unix:///var/run/crio/crio.sock --resolv-conf=/run/systemd/resolve/resolv.conf

but still the same result.

kubelet still seems to use dockerEndpoint. I did get that right that Docker is no longer required on the host when you use an alternative CRI-based runtime?

neolit123 commented 5 years ago

kubelet still seems to use dockerEndpoint

if you are seeing --container-runtime-endpoint=unix:///var/run/crio/crio.sock passed to the kubelet in the systemd logs, then we might be facing a kubelet problem.

try adding --v=2 (1 might be enough) and look at the list of flags near the kubelet start. also note that the kubelet CLI flags do override the kubelet config.

I did get that right that Docker is no longer required on the host when you use an alternative CRI-based runtime?

yes, that should be the case.

fhemberger commented 5 years ago

Looking at /lib/systemd/system/kubelet.service, there are no options passed to the kubelet.

[Unit]
Description=kubelet: The Kubernetes Node Agent
Documentation=https://kubernetes.io/docs/home/

[Service]
ExecStart=/usr/bin/kubelet
Restart=always
StartLimitInterval=0
RestartSec=10

[Install]
WantedBy=multi-user.target

Output of kubelet -v=2 and explicitly passing kubelet --config=/var/lib/kubelet/config.yaml -v=2 is the same (still using the default --container-runtime="docker"). Only passing the values of KUBELET_KUBEADM_ARGSdirectly does the trick (manually copied from /var/lib/kubelet/kubeadm-flags.env):

kubelet -v=2 \
  --container-runtime=remote \
  --container-runtime-endpoint=/var/run/crio/crio.sock \
  --resolv-conf=/run/systemd/resolve/resolv.conf

I get a deprecation warning:

util_unix.go:77] Using "/var/run/crio/crio.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/crio/crio.sock".

and afterwards, I'm running into https://github.com/kubernetes/kubernetes/issues/56850, which seems to be resolved by https://github.com/kubernetes/kubernetes/issues/56850#issuecomment-406241077, and a whole bunch of other issues, but that's a different matter.

neolit123 commented 5 years ago

Looking at /lib/systemd/system/kubelet.service, there are no options passed to the kubelet.

and that should be the case.

/var/lib/kubelet/kubeadm-flags.env

something is not right here. this is a file that is auto-generated by kubeadm on each init and passed to the kubelet from the unit file.

kubelet -v=2 \ --container-runtime=remote \ --container-runtime-endpoint=/var/run/crio/crio.sock \ --resolv-conf=/run/systemd/resolve/resolv.conf

if these are the contents of the file (minus the -v=2 part), kubeadm is doing it's job.

https://kubernetes.io/docs/setup/independent/kubelet-integration/#the-kubelet-drop-in-file-for-systemd

-v=2 should be added to /etc/default/kubelet.

util_unix.go:77] Using "/var/run/crio/crio.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/crio/crio.sock".

that's a problem that i've missed yes. please use the unix socket prefix.

whole bunch of other issues, but that's a different matter.

did you resolve, is this issue ready to be closed?

fhemberger commented 5 years ago

No, I still have to add those options manually to /lib/systemd/system/kubelet.service for them to work.

Putting them in KUBELET_KUBEADM_ARGS doesn't seem to have any effect.

neolit123 commented 5 years ago

No, I still have to add those options manually to /lib/systemd/system/kubelet.service for them to work.

the user should not touch /lib/systemd/system/kubelet.service adding flags in /etc/default/kubelet works for me.

bart0sh commented 5 years ago

/lifecycle active

fhemberger commented 5 years ago

Okay, did a complete fresh install of Ubuntu 18.04.1, used only the following config.yaml:

apiVersion: kubeadm.k8s.io/v1beta1
kind: InitConfiguration
nodeRegistration:
  criSocket: "unix:///var/run/crio/crio.sock"
  taints:

and didn't specify networking.podSubnet (we'll see how this turns out later). I don't get any unix:///var/run/docker.sock any longer, still running into #1153.

Closing, as this is a different matter. Thanks for your help!

parmentelat commented 1 year ago

for the record, I just ran into something similar on a fedora37 box that had been initially installed with kubernetes quite a while ago

I got rid of that issue by removing this file rm /etc/systemd/system/kubelet.service.d/10-kubeadm.conf that was apparently a leftover from the past, and was superseding the one in /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf

I was at first focusing on the plain kubelet.service file, so maybe checking for this kind of dropins can help out others observing the same symptom