Kirin v10 sp3 uses sealos 4.3 to install k8s error

fsckzy commented 1 year ago

Sealos Version

4.3.0

How to reproduce the bug?

操作系统：dist_id=Kylin-Server-V10-SP3-General-Release-2212-x86_64-2022-12-02 15:44:18
内核：4.19.90-52.26.v2207.ky10.x86_64

What is the expected behavior?

W0821 11:06:52.056286    3349 kubeconfig.go:249] a kubeconfig file "/etc/kubernetes/controller-manager.conf" exists already but has an unexp
ected API Server URL: expected: https://192.168.11.33:6443, got: https://apiserver.cluster.local:6443
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/controller-manager.conf"
W0821 11:06:52.189719    3349 kubeconfig.go:249] a kubeconfig file "/etc/kubernetes/scheduler.conf" exists already but has an unexpected API
 Server URL: expected: https://192.168.11.33:6443, got: https://apiserver.cluster.local:6443
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/scheduler.conf"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This ca
n take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
Unfortunately, an error has occurred:
    timed out waiting for the condition

This error is likely caused by:
    - The kubelet is not running
    - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
    - 'systemctl status kubelet'
    - 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
    - 'crictl --runtime-endpoint unix:///var/run/cri-dockerd.sock ps -a | grep kube | grep -v pause'
    Once you have found the failing container, you can inspect its logs with:
    - 'crictl --runtime-endpoint unix:///var/run/cri-dockerd.sock logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher
2023-08-21T11:10:52 error Applied to cluster error: failed to init init master0 failed, error: exit status 1. Please clean and reinstall
Error: failed to init init master0 failed, error: exit status 1. Please clean and reinstall

What do you see instead?

Aug 21 11:20:06 k8sm1 kubelet[3733]: E0821 11:20:06.634364    3733 kubelet.go:2424] "Error getting node" err="node \"k8sm1\" not found"
Aug 21 11:20:06 k8sm1 kubelet[3733]: I0821 11:20:06.637110    3733 pod_container_deletor.go:79] "Container not found in pod's containers" co
ntainerID="a4da352f0c13f2e7ba0bb487f53012620bff18c20b27e889921622b667d0d2a6"
Aug 21 11:20:06 k8sm1 kubelet[3733]: I0821 11:20:06.673003    3733 pod_container_deletor.go:79] "Container not found in pod's containers" co
ntainerID="f71ff5d2cebf47e1365182ef1a6af5a112fdf6a7c274d956a959e0d878e65635"
Aug 21 11:20:06 k8sm1 kubelet[3733]: I0821 11:20:06.703523    3733 pod_container_deletor.go:79] "Container not found in pod's containers" co
ntainerID="b5bb640c844a54b29a971940425e73c79546946de83ac760e4ba2a99b92b8fd9"
Aug 21 11:20:06 k8sm1 kubelet[3733]: W0821 11:20:06.733304    3733 watcher.go:93] Error while processing event ("/sys/fs/cgroup/memory/libco
ntainer_200282_systemd_test_default.slice": 0x40000100 == IN_CREATE|IN_ISDIR): readdirent /sys/fs/cgroup/memory/libcontainer_200282_systemd_
test_default.slice: no such file or directory
Aug 21 11:20:06 k8sm1 kubelet[3733]: W0821 11:20:06.733396    3733 watcher.go:93] Error while processing event ("/sys/fs/cgroup/devices/libc
ontainer_200282_systemd_test_default.slice": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/devices/libcontainer_200282
_systemd_test_default.slice: no such file or directory
Aug 21 11:20:06 k8sm1 kubelet[3733]: W0821 11:20:06.733415    3733 watcher.go:93] Error while processing event ("/sys/fs/cgroup/pids/libcont
ainer_200282_systemd_test_default.slice": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/pids/libcontainer_200282_syste
md_test_default.slice: no such file or directory
Aug 21 11:20:06 k8sm1 kubelet[3733]: E0821 11:20:06.734390    3733 kubelet.go:2424] "Error getting node" err="node \"k8sm1\" not found"
Aug 21 11:20:06 k8sm1 kubelet[3733]: I0821 11:20:06.742641    3733 pod_container_deletor.go:79] "Container not found in pod's containers" co
ntainerID="04373275fcabfcf56ae2889670c2475141e6e8038f24697cead79af81075b750"
Aug 21 11:20:06 k8sm1 kubelet[3733]: E0821 11:20:06.834470    3733 kubelet.go:2424] "Error getting node" err="node \"k8sm1\" not found"
Aug 21 11:20:06 k8sm1 kubelet[3733]: E0821 11:20:06.837772    3733 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"StartC
ontainer\" for \"kube-scheduler\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=kube-scheduler pod=kube-scheduler-k8sm1
_kube-system(c2db5cac4475519a0cf91f0a28649ffb)\"" pod="kube-system/kube-scheduler-k8sm1" podUID=c2db5cac4475519a0cf91f0a28649ffb

Operating environment

- Sealos version: 4.3
- Docker version: 
- Kubernetes version: 1.24.9
- Operating system: 
- Runtime environment:
- Cluster size:
- Additional information:

Additional information

No response

bxy4543 commented 1 year ago

感觉你的containerd可能没有起来，你看看containerd是不是有什么报错

sealos-ci-robot commented 1 year ago

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

It seems that your containerd may not be up, please check if there is any error reported by containerd

fengxsong commented 11 months ago

save bellow to cgroupfs.yaml

apiVersion: apps.sealos.io/v1beta1
kind: Config
metadata:
  name: containerd-config
spec:
  path: etc/config.toml.tmpl
  match: registry.rootcloud.com/cloudimages/kubernetes:v1.23.10
  strategy: override
  data: |
    version = 2
    root = "{{ .criData }}"
    state = "/run/containerd"
    oom_score = 0

    [grpc]
      address = "/run/containerd/containerd.sock"
      uid = 0
      gid = 0
      max_recv_message_size = 16777216
      max_send_message_size = 16777216

    [debug]
      address = "/run/containerd/containerd-debug.sock"
      uid = 0
      gid = 0
      level = "warn"

    [timeouts]
      "io.containerd.timeout.shim.cleanup" = "5s"
      "io.containerd.timeout.shim.load" = "5s"
      "io.containerd.timeout.shim.shutdown" = "3s"
      "io.containerd.timeout.task.state" = "2s"

    [plugins]
      [plugins."io.containerd.grpc.v1.cri"]
        sandbox_image = "{{ .registryDomain }}:{{ .registryPort }}/{{ .sandboxImage }}"
        max_container_log_line_size = -1
        max_concurrent_downloads = 20
        disable_apparmor = {{ .disableApparmor }}
        [plugins."io.containerd.grpc.v1.cri".containerd]
          snapshotter = "overlayfs"
          default_runtime_name = "runc"
          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
            [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
              runtime_type = "io.containerd.runc.v2"
              runtime_engine = ""
              runtime_root = ""
              [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
                SystemdCgroup = false
        [plugins."io.containerd.grpc.v1.cri".registry]
          config_path = "/etc/containerd/certs.d"
          [plugins."io.containerd.grpc.v1.cri".registry.configs]
              [plugins."io.containerd.grpc.v1.cri".registry.configs."{{ .registryDomain }}:{{ .registryPort }}".auth]
                username = "{{ .registryUsername }}"
                password = "{{ .registryPassword }}"

run sealos with --config-file option, like sealos run --config-file=cgroupfs.yaml ...

stale[bot] commented 8 months ago

This issue has been automatically closed because we haven't heard back for more than 60 days, please reopen this issue if necessary.

SupRenekton commented 3 months ago

我和你碰到了一样的问题，请问下这个问题解决了么，我也是用麒麟sp3的机器搭建k8s集群，然后静态pod拉不起来，kubelet报错找不到master01 node节点，有解决办法告诉下我，谢谢

sealos-ci-robot commented 3 months ago

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

I encountered the same problem as you. Have you solved this problem? I also used a Kirin sp3 machine to build a k8s cluster, and then the static pod could not be pulled up. The kubelet reported an error that the master01 node could not be found. If you have a solution, let me know. ,Thanks

labring / sealos