Failed to deploy a single node cluster on ali-cloud ECS, "Cannot connect to the Docker daemon"

sealerio / sealer

Build, Share and Run Both Your Kubernetes Cluster and Distributed Applications (Project under CNCF)

Apache License 2.0

2.06k stars 362 forks source link

遇到了相同的问题，在阿里云服务器中，系统Ubuntu 18.0.4，希望安装一个单节点的k8s集群（希望在本服务器安装，因此先后试过服务器ip和本地ip 127.0.0.1）。

OS: ubuntu 18.04 linux kernel: 4.15.0-192-generic sealer version: {"gitVersion":"v0.8.6","gitCommit":"884513e","buildDate":"2022-07-12 02:58:54","goVersion":"go1.16.15","compiler":"gc","platform":"linux/amd64"}

运行sealer run -m 127.0.0.1 -p xxx之后出现以下提示：


++ dirname ./init-registry.sh
+ cd .
+ REGISTRY_PORT=5000
+ VOLUME=/var/lib/sealer/data/my-cluster/rootfs/registry
+ REGISTRY_DOMAIN=sea.hub
+ container=sealer-registry
+++ pwd
++ dirname /var/lib/sealer/data/my-cluster/rootfs/scripts
+ rootfs=/var/lib/sealer/data/my-cluster/rootfs
+ config=/var/lib/sealer/data/my-cluster/rootfs/etc/registry_config.yml
+ htpasswd=/var/lib/sealer/data/my-cluster/rootfs/etc/registry_htpasswd
+ certs_dir=/var/lib/sealer/data/my-cluster/rootfs/certs
+ image_dir=/var/lib/sealer/data/my-cluster/rootfs/images
+ mkdir -p /var/lib/sealer/data/my-cluster/rootfs/registry
+ load_images
+ for image in "$image_dir"/*
+ '[' -f /var/lib/sealer/data/my-cluster/rootfs/images/registry.tar ']'
+ docker load -q -i /var/lib/sealer/data/my-cluster/rootfs/images/registry.tar
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
...
2022-09-13 16:04:20 [ERROR] [root.go:70] sealer-v0.8.6: failed to init master0: failed to execute command(cd /var/lib/sealer/data/my-cluster/rootfs/scripts && ./init-registry.sh 5000 /var/lib/sealer/data/my-cluster/rootfs/registry sea.hub) on host(127.0.0.1): error(Process exited with status 1)

以服务器公网ip作为master ip来安装时(卸载了自己安装的docker)，出现以下提示：

++ dirname ./init-registry.sh
+ cd .
+ REGISTRY_PORT=5000
+ VOLUME=/var/lib/sealer/data/my-cluster/rootfs/registry
+ REGISTRY_DOMAIN=sea.hub
+ container=sealer-registry
+++ pwd
++ dirname /var/lib/sealer/data/my-cluster/rootfs/scripts
+ rootfs=/var/lib/sealer/data/my-cluster/rootfs
+ config=/var/lib/sealer/data/my-cluster/rootfs/etc/registry_config.yml
+ htpasswd=/var/lib/sealer/data/my-cluster/rootfs/etc/registry_htpasswd
+ certs_dir=/var/lib/sealer/data/my-cluster/rootfs/certs
+ image_dir=/var/lib/sealer/data/my-cluster/rootfs/images
+ mkdir -p /var/lib/sealer/data/my-cluster/rootfs/registry
+ load_images
+ for image in "$image_dir"/*
+ '[' -f /var/lib/sealer/data/my-cluster/rootfs/images/registry.tar ']'
+ docker load -q -i /var/lib/sealer/data/my-cluster/rootfs/images/registry.tar
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
2022-09-13 17:05:00 [DEBUG] [sshcmd.go:114] failed to execute command(cd /var/lib/sealer/data/my-cluster/rootfs/scripts && ./init-registry.sh 5000 /var/lib/sealer/data/my-cluster/rootfs/registry sea.hub) on host(xxx): error(failed to execute command(cd /var/lib/sealer/data/my-cluster/rootfs/scripts && ./init-registry.sh 5000 /var/lib/sealer/data/my-cluster/rootfs/registry sea.hub) on host(xxx): error(Process exited with status 1))

查看docker状态，显示如下：

root@iZbp14czvx1exbfxgr520cZ:/var/run/docker# systemctl status docker
● docker.service
   Loaded: masked (/dev/null; bad)
   Active: inactive (dead) since Tue 2022-09-13 15:56:09 CST; 1h 26min ago
 Main PID: 8991 (code=exited, status=0/SUCCESS)

Sep 13 15:52:40 iZbp14czvx1exbfxgr520cZ dockerd[8991]: time="2022-09-13T15:52:40.178732070+08:00" level=info msg="Attempting next endpoint for pull after error: Get https://sea.hub:5000/v2/: net/http: req
Sep 13 15:52:40 iZbp14czvx1exbfxgr520cZ dockerd[8991]: time="2022-09-13T15:52:40.178813138+08:00" level=error msg="Handler for POST /v1.40/images/create returned error: Get https://sea.hub:5000/v2/: net/h
Sep 13 15:52:55 iZbp14czvx1exbfxgr520cZ dockerd[8991]: time="2022-09-13T15:52:55.209106031+08:00" level=warning msg="Error getting v2 registry: Get https://sea.hub:5000/v2/: net/http: request canceled whi
Sep 13 15:52:55 iZbp14czvx1exbfxgr520cZ dockerd[8991]: time="2022-09-13T15:52:55.209148840+08:00" level=info msg="Attempting next endpoint for pull after error: Get https://sea.hub:5000/v2/: net/http: req
Sep 13 15:52:55 iZbp14czvx1exbfxgr520cZ dockerd[8991]: time="2022-09-13T15:52:55.209185375+08:00" level=error msg="Handler for POST /v1.40/images/create returned error: Get https://sea.hub:5000/v2/: net/h
Sep 13 15:56:09 iZbp14czvx1exbfxgr520cZ systemd[1]: Stopping Docker Application Container Engine...
Sep 13 15:56:09 iZbp14czvx1exbfxgr520cZ dockerd[8991]: time="2022-09-13T15:56:09.123579156+08:00" level=info msg="Processing signal 'terminated'"
Sep 13 15:56:09 iZbp14czvx1exbfxgr520cZ dockerd[8991]: time="2022-09-13T15:56:09.464433975+08:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.Task
Sep 13 15:56:09 iZbp14czvx1exbfxgr520cZ dockerd[8991]: time="2022-09-13T15:56:09.507618852+08:00" level=info msg="Daemon shutdown complete"
Sep 13 15:56:09 iZbp14czvx1exbfxgr520cZ systemd[1]: Stopped Docker Application Container Engine.

尝试重启docker服务，也出现和楼上相似的报错systemd[5169]: docker.service: Failed at step EXEC spawning /usr/sbin/iptables: No such file or directory，完整状态如下：

Sep 13 17:23:44 iZbp14czvx1exbfxgr520cZ dockerd[4962]: time="2022-09-13T17:23:44.154681261+08:00" level=info msg="API listen on /var/run/docker.sock"
Sep 13 17:23:44 iZbp14czvx1exbfxgr520cZ systemd[5169]: docker.service: Failed to execute command: No such file or directory
Sep 13 17:23:44 iZbp14czvx1exbfxgr520cZ systemd[5169]: docker.service: Failed at step EXEC spawning /usr/sbin/iptables: No such file or directory
Sep 13 17:23:44 iZbp14czvx1exbfxgr520cZ systemd[1]: docker.service: Control process exited, code=exited status=203
Sep 13 17:23:44 iZbp14czvx1exbfxgr520cZ dockerd[4962]: time="2022-09-13T17:23:44.159999418+08:00" level=info msg="Processing signal 'terminated'"
Sep 13 17:23:44 iZbp14czvx1exbfxgr520cZ dockerd[4962]: time="2022-09-13T17:23:44.424053918+08:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.Task
Sep 13 17:23:44 iZbp14czvx1exbfxgr520cZ dockerd[4962]: time="2022-09-13T17:23:44.454106725+08:00" level=info msg="stopping event stream following graceful shutdown" error="<nil>" module=libcontainerd name
Sep 13 17:23:44 iZbp14czvx1exbfxgr520cZ dockerd[4962]: time="2022-09-13T17:23:44.454531797+08:00" level=info msg="Daemon shutdown complete"
Sep 13 17:23:44 iZbp14czvx1exbfxgr520cZ systemd[1]: docker.service: Failed with result 'exit-code'.
Sep 13 17:23:44 iZbp14czvx1exbfxgr520cZ systemd[1]: Failed to start Docker Application Container Engine.

Originally posted by @cFireworks in https://github.com/sealerio/sealer/issues/1657#issuecomment-1245073966

2022-09-14 12:12:52 [INFO] [init.go:259] start to init master0... W0914 12:52:53.049306 6789 strict.go:54] error unmarshaling configuration schema.GroupVersionKind{Group:"kubelet.config.k8s.io", Version:"v1beta1", Kind:"KubeletConfiguration"}: error unmarshaling JSON: while decoding JSON: json: unknown field "shutdownGracePeriod" W0914 12:52:53.116282 6789 configset.go:348] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io] [init] Using Kubernetes version: v1.19.8 [preflight] Running pre-flight checks [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/ [WARNING FileExisting-ebtables]: ebtables not found in system path [WARNING FileExisting-socat]: socat not found in system path [preflight] Pulling images required for setting up a Kubernetes cluster [preflight] This might take a minute or two, depending on the speed of your internet connection [preflight] You can also perform this action in beforehand using 'kubeadm config images pull' [certs] Using certificateDir folder "/etc/kubernetes/pki" [certs] Using existing ca certificate authority [certs] Using existing apiserver certificate and key on disk [certs] Using existing apiserver-kubelet-client certificate and key on disk [certs] Using existing front-proxy-ca certificate authority [certs] Using existing front-proxy-client certificate and key on disk [certs] Using existing etcd/ca certificate authority [certs] Using existing etcd/server certificate and key on disk [certs] Using existing etcd/peer certificate and key on disk [certs] Using existing etcd/healthcheck-client certificate and key on disk [certs] Using existing apiserver-etcd-client certificate and key on disk [certs] Using the existing "sa" key [kubeconfig] Using kubeconfig folder "/etc/kubernetes" [kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/admin.conf" [kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/kubelet.conf" W0914 12:53:41.278905 6789 kubeconfig.go:242] a kubeconfig file "/etc/kubernetes/controller-manager.conf" exists already but has an unexpected API Server URL: expected: https://47.98.112.245:6443, got: https://apiserver.cluster.local:6443 [kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/controller-manager.conf" W0914 12:53:41.339108 6789 kubeconfig.go:242] a kubeconfig file "/etc/kubernetes/scheduler.conf" exists already but has an unexpected API Server URL: expected: https://47.98.112.245:6443, got: https://apiserver.cluster.local:6443 [kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/scheduler.conf" [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Starting the kubelet [control-plane] Using manifest folder "/etc/kubernetes/manifests" [control-plane] Creating static Pod manifest for "kube-apiserver" [control-plane] Creating static Pod manifest for "kube-controller-manager" [control-plane] Creating static Pod manifest for "kube-scheduler" [etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests" [wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s [kubelet-check] Initial timeout of 40s passed. Unfortunately, an error has occurred: timed out waiting for the condition This error is likely caused by: - The kubelet is not running - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled) If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands: - 'systemctl status kubelet' - 'journalctl -xeu kubelet' Additionally, a control plane component may have crashed or exited when started by the container runtime. To troubleshoot, list all containers using your preferred container runtimes CLI. Here is one example how you may list all Kubernetes containers running in docker: - 'docker ps -a | grep kube | grep -v pause' Once you have found the failing container, you can inspect its logs with: - 'docker logs CONTAINERID' error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster To see the stack trace of this error execute with --v=5 or higher 2022-09-14 12:12:57 [ERROR] [root.go:70] sealer-v0.8.6: failed to init master0: failed to init master0: [ssh][xx.xx.xx.xx]run command failed [kubeadm init --config=/var/lib/sealer/data/my-cluster/rootfs/etc/kubeadm.yml --upload-certs -v 0 --ignore-preflight-errors=SystemVerification]. Please clean and reinstall

● kubelet.service - kubelet: The Kubernetes Node Agent Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled) Drop-In: /etc/systemd/system/kubelet.service.d └─10-kubeadm.conf Active: active (running) since Wed 2022-09-14 12:53:41 CST; 8min ago Docs: http://kubernetes.io/docs/ Process: 7309 ExecStartPre=/usr/bin/kubelet-pre-start.sh (code=exited, status=0/SUCCESS) Main PID: 7344 (kubelet) Tasks: 13 (limit: 2211) CGroup: /system.slice/kubelet.service └─7344 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --network-plugin=cni --pod-infra-container-image=sea.hub:5000/pause:3.2 Sep 14 13:02:39 iZbp14czvx1exbfxgr520cZ kubelet[7344]: I0914 13:02:39.863882 7344 kubelet.go:449] kubelet nodes not sync Sep 14 13:02:39 iZbp14czvx1exbfxgr520cZ kubelet[7344]: I0914 13:02:39.863905 7344 kubelet.go:449] kubelet nodes not sync Sep 14 13:02:39 iZbp14czvx1exbfxgr520cZ kubelet[7344]: I0914 13:02:39.863910 7344 kubelet.go:449] kubelet nodes not sync Sep 14 13:02:39 iZbp14czvx1exbfxgr520cZ kubelet[7344]: E0914 13:02:39.868585 7344 kubelet.go:2134] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized Sep 14 13:02:39 iZbp14czvx1exbfxgr520cZ kubelet[7344]: I0914 13:02:39.946896 7344 kubelet.go:449] kubelet nodes not sync Sep 14 13:02:40 iZbp14czvx1exbfxgr520cZ kubelet[7344]: I0914 13:02:40.061811 7344 kubelet.go:449] kubelet nodes not sync Sep 14 13:02:40 iZbp14czvx1exbfxgr520cZ kubelet[7344]: I0914 13:02:40.863958 7344 kubelet.go:449] kubelet nodes not sync Sep 14 13:02:40 iZbp14czvx1exbfxgr520cZ kubelet[7344]: I0914 13:02:40.863962 7344 kubelet.go:449] kubelet nodes not sync Sep 14 13:02:40 iZbp14czvx1exbfxgr520cZ kubelet[7344]: I0914 13:02:40.946911 7344 kubelet.go:449] kubelet nodes not sync Sep 14 13:02:41 iZbp14czvx1exbfxgr520cZ kubelet[7344]: I0914 13:02:41.061857 7344 kubelet.go:449] kubelet nodes not sync

sealerio / sealer

Failed to deploy a single node cluster on ali-cloud ECS, "Cannot connect to the Docker daemon" #1703