k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
27.92k stars 2.33k forks source link

Air-Gap installing k3s on arm64 failed with error: Failed to retrieve node info: nodes "k3s01" is forbidden: User "system:kube-proxy" cannot get resource "nodes" in API group "" at the cluster scope #2553

Closed wangxi83 closed 3 years ago

wangxi83 commented 3 years ago

I now install a k3s on:

A centos7 with arm64;

I have disabled: firewall selinux

I have modified /etc/hosts;

I have installed docker-ce;

I have executed:

  1. https://get.k3s.io to install.sh
  2. k3s-airgap-images-arm64.tar to the right place "/var/lib/rancher/k3s/agent/images/"
  3. docker load < /var/lib/rancher/k3s/agent/images/k3s-airgap-images-arm64.tar
  4. chmod +x k3s-arm64 and move to "/usr/local/bin/k3s".
  5. INSTALL_K3S_SKIP_DOWNLOAD=true INSTALL_K3S_EXEC='server --docker' K3S_DATASTORE_ENDPOINT='mysql://xxxx' ./install.sh

When install, I could see the error:

  1. Failed to retrieve node info: nodes "k3s01" is forbidden: User "system:kube-proxy" cannot get resource "nodes" in API group "" at the cluster scope
  2. failed to get node "k3s01" when trying to set owner ref to the node lease: nodes "k3s01" not found
  3. Failed to retrieve checkpoint for "kubelet_internal_checkpoint": checkpoint is not found
  4. eviction manager: failed to get summary stats: failed to get node info: node "k3s01" not found
  5. Failed to retrieve checkpoint for "kubelet_internal_checkpoint": checkpoint is not found

what do I missed? need help!

Nov 19 18:34:00 k3s01 systemd[1]: k3s.service holdoff time over, scheduling restart.
Nov 19 18:34:00 k3s01 systemd[1]: Stopped Lightweight Kubernetes.
Nov 19 18:34:00 k3s01 systemd[1]: Starting Lightweight Kubernetes...
Nov 19 18:34:01 k3s01 k3s[14869]: time="2020-11-19T18:34:01.096605488+08:00" level=info msg="Starting k3s v1.19.3+k3s3 (0e4fbfef)"
Nov 19 18:34:01 k3s01 k3s[14869]: time="2020-11-19T18:34:01.097308366+08:00" level=info msg="Cluster bootstrap already complete"
Nov 19 18:34:01 k3s01 k3s[14869]: time="2020-11-19T18:34:01.133248889+08:00" level=info msg="Configuring mysql database connection pooling: maxIdleConns=2, maxOpenConns=0, connMaxLifetime=0s"
Nov 19 18:34:01 k3s01 k3s[14869]: time="2020-11-19T18:34:01.133354960+08:00" level=info msg="Configuring database table schema and indexes, this may take a moment..."
Nov 19 18:34:01 k3s01 k3s[14869]: time="2020-11-19T18:34:01.144147831+08:00" level=info msg="Database tables and indexes are up to date"
Nov 19 18:34:01 k3s01 k3s[14869]: time="2020-11-19T18:34:01.151407982+08:00" level=info msg="Kine listening on unix://kine.sock"
Nov 19 18:34:01 k3s01 k3s[14869]: time="2020-11-19T18:34:01.151899308+08:00" level=info msg="Running kube-apiserver --advertise-port=6443 --allow-privileged=true --anonymous-auth=false --api-audiences=unknown --authorization-mode=Node,RBAC -
Nov 19 18:34:01 k3s01 k3s[14869]: I1119 18:34:01.154596   14869 server.go:652] external host was not specified, using 172.16.149.122
Nov 19 18:34:01 k3s01 k3s[14869]: I1119 18:34:01.155140   14869 server.go:177] Version: v1.19.3+k3s3
Nov 19 18:34:01 k3s01 k3s[14869]: I1119 18:34:01.169204   14869 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByConditi
Nov 19 18:34:01 k3s01 k3s[14869]: I1119 18:34:01.169262   14869 plugins.go:161] Loaded 10 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,RuntimeClass,Ce
Nov 19 18:34:01 k3s01 k3s[14869]: I1119 18:34:01.173928   14869 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByConditi
Nov 19 18:34:01 k3s01 k3s[14869]: I1119 18:34:01.174028   14869 plugins.go:161] Loaded 10 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,RuntimeClass,Ce
Nov 19 18:34:01 k3s01 k3s[14869]: I1119 18:34:01.205120   14869 master.go:271] Using reconciler: lease
Nov 19 18:34:01 k3s01 k3s[14869]: W1119 18:34:01.743402   14869 genericapiserver.go:412] Skipping API batch/v2alpha1 because it has no resources.
Nov 19 18:34:01 k3s01 k3s[14869]: W1119 18:34:01.759970   14869 genericapiserver.go:412] Skipping API discovery.k8s.io/v1alpha1 because it has no resources.
Nov 19 18:34:01 k3s01 k3s[14869]: W1119 18:34:01.779239   14869 genericapiserver.go:412] Skipping API node.k8s.io/v1alpha1 because it has no resources.
Nov 19 18:34:01 k3s01 k3s[14869]: W1119 18:34:01.828924   14869 genericapiserver.go:412] Skipping API rbac.authorization.k8s.io/v1alpha1 because it has no resources.
Nov 19 18:34:01 k3s01 k3s[14869]: W1119 18:34:01.847996   14869 genericapiserver.go:412] Skipping API scheduling.k8s.io/v1alpha1 because it has no resources.
Nov 19 18:34:01 k3s01 k3s[14869]: W1119 18:34:01.905172   14869 genericapiserver.go:412] Skipping API storage.k8s.io/v1alpha1 because it has no resources.
Nov 19 18:34:01 k3s01 k3s[14869]: W1119 18:34:01.958535   14869 genericapiserver.go:412] Skipping API apps/v1beta2 because it has no resources.
Nov 19 18:34:01 k3s01 k3s[14869]: W1119 18:34:01.958619   14869 genericapiserver.go:412] Skipping API apps/v1beta1 because it has no resources.
Nov 19 18:34:01 k3s01 k3s[14869]: I1119 18:34:01.986478   14869 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByConditi
Nov 19 18:34:01 k3s01 k3s[14869]: I1119 18:34:01.986547   14869 plugins.go:161] Loaded 10 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,RuntimeClass,Ce
Nov 19 18:34:02 k3s01 k3s[14869]: time="2020-11-19T18:34:02.002845121+08:00" level=info msg="Running kube-scheduler --address=127.0.0.1 --bind-address=127.0.0.1 --kubeconfig=/var/lib/rancher/k3s/server/cred/scheduler.kubeconfig --port=10251 
Nov 19 18:34:02 k3s01 k3s[14869]: time="2020-11-19T18:34:02.002963332+08:00" level=info msg="Waiting for API server to become available"
Nov 19 18:34:02 k3s01 k3s[14869]: I1119 18:34:02.003268   14869 registry.go:173] Registering SelectorSpread plugin
Nov 19 18:34:02 k3s01 k3s[14869]: I1119 18:34:02.003295   14869 registry.go:173] Registering SelectorSpread plugin
Nov 19 18:34:02 k3s01 k3s[14869]: time="2020-11-19T18:34:02.003710030+08:00" level=info msg="Running kube-controller-manager --address=127.0.0.1 --allocate-node-cidrs=true --bind-address=127.0.0.1 --cluster-cidr=10.42.0.0/16 --cluster-signin
Nov 19 18:34:02 k3s01 k3s[14869]: time="2020-11-19T18:34:02.006031857+08:00" level=info msg="Node token is available at /var/lib/rancher/k3s/server/token"
Nov 19 18:34:02 k3s01 k3s[14869]: time="2020-11-19T18:34:02.006094037+08:00" level=info msg="To join node to cluster: k3s agent -s https://172.16.149.122:6443 -t ${NODE_TOKEN}"
Nov 19 18:34:02 k3s01 k3s[14869]: time="2020-11-19T18:34:02.007740786+08:00" level=info msg="Wrote kubeconfig /etc/rancher/k3s/k3s.yaml"
Nov 19 18:34:02 k3s01 k3s[14869]: time="2020-11-19T18:34:02.007794166+08:00" level=info msg="Run: k3s kubectl"
Nov 19 18:34:02 k3s01 k3s[14869]: time="2020-11-19T18:34:02.007924218+08:00" level=info msg="Module overlay was already loaded"
Nov 19 18:34:02 k3s01 k3s[14869]: time="2020-11-19T18:34:02.007945768+08:00" level=info msg="Module nf_conntrack was already loaded"
Nov 19 18:34:02 k3s01 k3s[14869]: time="2020-11-19T18:34:02.007963588+08:00" level=info msg="Module br_netfilter was already loaded"
Nov 19 18:34:02 k3s01 k3s[14869]: time="2020-11-19T18:34:02.007980778+08:00" level=info msg="Module iptable_nat was already loaded"
Nov 19 18:34:02 k3s01 k3s[14869]: time="2020-11-19T18:34:02.056338400+08:00" level=info msg="Cluster-Http-Server 2020/11/19 18:34:02 http: TLS handshake error from 127.0.0.1:47884: remote error: tls: bad certificate"
Nov 19 18:34:02 k3s01 k3s[14869]: time="2020-11-19T18:34:02.065091168+08:00" level=info msg="Cluster-Http-Server 2020/11/19 18:34:02 http: TLS handshake error from 127.0.0.1:47890: remote error: tls: bad certificate"
Nov 19 18:34:02 k3s01 k3s[14869]: time="2020-11-19T18:34:02.084905350+08:00" level=info msg="certificate CN=k3s01 signed by CN=k3s-server-ca@1605782026: notBefore=2020-11-19 10:33:46 +0000 UTC notAfter=2021-11-19 10:34:02 +0000 UTC"
Nov 19 18:34:02 k3s01 k3s[14869]: time="2020-11-19T18:34:02.090143569+08:00" level=info msg="certificate CN=system:node:k3s01,O=system:nodes signed by CN=k3s-client-ca@1605782026: notBefore=2020-11-19 10:33:46 +0000 UTC notAfter=2021-11-19 1
Nov 19 18:34:02 k3s01 k3s[14869]: time="2020-11-19T18:34:02.107841437+08:00" level=info msg="Connecting to proxy" url="wss://127.0.0.1:6443/v1-k3s/connect"
Nov 19 18:34:02 k3s01 k3s[14869]: time="2020-11-19T18:34:02.114336430+08:00" level=info msg="Handling backend connection request [k3s01]"
Nov 19 18:34:02 k3s01 k3s[14869]: time="2020-11-19T18:34:02.117146991+08:00" level=info msg="Running kubelet --address=0.0.0.0 --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --cgroup-driver=cgroupfs -
Nov 19 18:34:02 k3s01 k3s[14869]: time="2020-11-19T18:34:02.118948042+08:00" level=info msg="Running kube-proxy --cluster-cidr=10.42.0.0/16 --healthz-bind-address=127.0.0.1 --hostname-override=k3s01 --kubeconfig=/var/lib/rancher/k3s/agent/ku
Nov 19 18:34:02 k3s01 k3s[14869]: Flag --cloud-provider has been deprecated, will be removed in 1.23, in favor of removing cloud provider code from Kubelet.
Nov 19 18:34:02 k3s01 k3s[14869]: W1119 18:34:02.126748   14869 server.go:226] WARNING: all flags other than --config, --write-config-to, and --cleanup are deprecated. Please begin using a config file ASAP.
Nov 19 18:34:02 k3s01 k3s[14869]: I1119 18:34:02.131082   14869 server.go:407] Version: v1.19.3+k3s3
Nov 19 18:34:02 k3s01 k3s[14869]: time="2020-11-19T18:34:02.136490228+08:00" level=info msg="Waiting for node k3s01: nodes \"k3s01\" not found"
Nov 19 18:34:02 k3s01 k3s[14869]: I1119 18:34:02.140678   14869 dynamic_cafile_content.go:167] Starting client-ca-bundle::/var/lib/rancher/k3s/agent/client-ca.crt
Nov 19 18:34:02 k3s01 k3s[14869]: E1119 18:34:02.143877   14869 node.go:125] Failed to retrieve node info: nodes "k3s01" is forbidden: User "system:kube-proxy" cannot get resource "nodes" in API group "" at the cluster scope
Nov 19 18:34:02 k3s01 k3s[14869]: W1119 18:34:02.246425   14869 nvidia.go:61] NVIDIA GPU metrics will not be available: no NVIDIA devices found
Nov 19 18:34:02 k3s01 k3s[14869]: I1119 18:34:02.285604   14869 server.go:640] --cgroups-per-qos enabled, but --cgroup-root was not specified.  defaulting to /
Nov 19 18:34:02 k3s01 k3s[14869]: I1119 18:34:02.286213   14869 container_manager_linux.go:289] container manager verified user specified cgroup-root exists: []
Nov 19 18:34:02 k3s01 k3s[14869]: I1119 18:34:02.286235   14869 container_manager_linux.go:294] Creating Container Manager object based on Node Config: {RuntimeCgroupsName:/systemd/system.slice SystemCgroupsName: KubeletCgroupsName:/systemd/
Nov 19 18:34:02 k3s01 k3s[14869]: I1119 18:34:02.286372   14869 topology_manager.go:126] [topologymanager] Creating topology manager with none policy
Nov 19 18:34:02 k3s01 k3s[14869]: I1119 18:34:02.286383   14869 container_manager_linux.go:324] [topologymanager] Initializing Topology Manager with none policy
Nov 19 18:34:02 k3s01 k3s[14869]: I1119 18:34:02.286389   14869 container_manager_linux.go:329] Creating device plugin manager: true
Nov 19 18:34:02 k3s01 k3s[14869]: I1119 18:34:02.286544   14869 client.go:77] Connecting to docker on unix:///var/run/docker.sock
Nov 19 18:34:02 k3s01 k3s[14869]: I1119 18:34:02.286559   14869 client.go:94] Start docker client with request timeout=2m0s
Nov 19 18:34:02 k3s01 k3s[14869]: W1119 18:34:02.300789   14869 docker_service.go:565] Hairpin mode set to "promiscuous-bridge" but kubenet is not enabled, falling back to "hairpin-veth"
Nov 19 18:34:02 k3s01 k3s[14869]: I1119 18:34:02.300832   14869 docker_service.go:241] Hairpin mode set to "hairpin-veth"
Nov 19 18:34:02 k3s01 k3s[14869]: I1119 18:34:02.320964   14869 docker_service.go:256] Docker cri networking managed by cni
Nov 19 18:34:02 k3s01 k3s[14869]: I1119 18:34:02.337222   14869 docker_service.go:261] Docker Info: &{ID:FJKE:WKEE:Y3CJ:NYY4:2HNW:XP4C:WKJJ:IEBG:RFZK:DMPZ:6KUS:LSPM Containers:1 ContainersRunning:1 ContainersPaused:0 ContainersStopped:0 Imag
Nov 19 18:34:02 k3s01 k3s[14869]: I1119 18:34:02.337360   14869 docker_service.go:274] Setting cgroupDriver to cgroupfs
Nov 19 18:34:02 k3s01 k3s[14869]: I1119 18:34:02.355586   14869 kubelet.go:261] Adding pod path: /var/lib/rancher/k3s/agent/pod-manifests
Nov 19 18:34:02 k3s01 k3s[14869]: I1119 18:34:02.355655   14869 kubelet.go:273] Watching apiserver
Nov 19 18:34:02 k3s01 k3s[14869]: I1119 18:34:02.374568   14869 kuberuntime_manager.go:214] Container runtime docker initialized, version: 19.03.13, apiVersion: 1.40.0
Nov 19 18:34:02 k3s01 k3s[14869]: I1119 18:34:02.375006   14869 server.go:1148] Started kubelet
Nov 19 18:34:02 k3s01 k3s[14869]: E1119 18:34:02.375165   14869 kubelet.go:1218] Image garbage collection failed once. Stats initialization may not have completed yet: failed to get imageFs info: unable to find data in memory cache
Nov 19 18:34:02 k3s01 k3s[14869]: I1119 18:34:02.375864   14869 fs_resource_analyzer.go:64] Starting FS ResourceAnalyzer
Nov 19 18:34:02 k3s01 k3s[14869]: I1119 18:34:02.375900   14869 server.go:152] Starting to listen on 0.0.0.0:10250
Nov 19 18:34:02 k3s01 k3s[14869]: I1119 18:34:02.377260   14869 server.go:424] Adding debug handlers to kubelet server.
Nov 19 18:34:02 k3s01 k3s[14869]: I1119 18:34:02.378911   14869 volume_manager.go:265] Starting Kubelet Volume Manager
Nov 19 18:34:02 k3s01 k3s[14869]: I1119 18:34:02.380434   14869 desired_state_of_world_populator.go:139] Desired state populator starts to run
Nov 19 18:34:02 k3s01 k3s[14869]: E1119 18:34:02.396666   14869 controller.go:228] failed to get node "k3s01" when trying to set owner ref to the node lease: nodes "k3s01" not found
Nov 19 18:34:02 k3s01 k3s[14869]: I1119 18:34:02.407459   14869 status_manager.go:158] Starting to sync pod status with apiserver
Nov 19 18:34:02 k3s01 k3s[14869]: I1119 18:34:02.407537   14869 kubelet.go:1741] Starting kubelet main sync loop.
Nov 19 18:34:02 k3s01 k3s[14869]: E1119 18:34:02.407610   14869 kubelet.go:1765] skipping pod synchronization - [container runtime status check may not have completed yet, PLEG is not healthy: pleg has yet to be successful]
Nov 19 18:34:02 k3s01 k3s[14869]: E1119 18:34:02.479567   14869 kubelet.go:2183] node "k3s01" not found
Nov 19 18:34:02 k3s01 k3s[14869]: E1119 18:34:02.507742   14869 kubelet.go:1765] skipping pod synchronization - container runtime status check may not have completed yet
Nov 19 18:34:02 k3s01 k3s[14869]: I1119 18:34:02.527222   14869 kubelet_node_status.go:70] Attempting to register node k3s01
Nov 19 18:34:02 k3s01 k3s[14869]: E1119 18:34:02.580229   14869 kubelet.go:2183] node "k3s01" not found
Nov 19 18:34:02 k3s01 k3s[14869]: I1119 18:34:02.593737   14869 cpu_manager.go:184] [cpumanager] starting with none policy
Nov 19 18:34:02 k3s01 k3s[14869]: I1119 18:34:02.593774   14869 cpu_manager.go:185] [cpumanager] reconciling every 10s
Nov 19 18:34:02 k3s01 k3s[14869]: I1119 18:34:02.593808   14869 state_mem.go:36] [cpumanager] initializing new in-memory state store
Nov 19 18:34:02 k3s01 k3s[14869]: I1119 18:34:02.594085   14869 state_mem.go:88] [cpumanager] updated default cpuset: ""
Nov 19 18:34:02 k3s01 k3s[14869]: I1119 18:34:02.594100   14869 state_mem.go:96] [cpumanager] updated cpuset assignments: "map[]"
Nov 19 18:34:02 k3s01 k3s[14869]: I1119 18:34:02.594126   14869 policy_none.go:43] [cpumanager] none policy: Start
Nov 19 18:34:02 k3s01 k3s[14869]: W1119 18:34:02.595611   14869 manager.go:596] Failed to retrieve checkpoint for "kubelet_internal_checkpoint": checkpoint is not found
Nov 19 18:34:02 k3s01 k3s[14869]: I1119 18:34:02.596017   14869 plugin_manager.go:114] Starting Kubelet Plugin Manager
Nov 19 18:34:02 k3s01 k3s[14869]: E1119 18:34:02.596072   14869 eviction_manager.go:260] eviction manager: failed to get summary stats: failed to get node info: node "k3s01" not found
Nov 19 18:34:02 k3s01 k3s[14869]: W1119 18:34:02.601243   14869 reflector.go:424] storage/cacher.go:/csidrivers: watch of *storage.CSIDriver ended with: very short watch: storage/cacher.go:/csidrivers: Unexpected watch close - watch lasted l
Nov 19 18:34:02 k3s01 k3s[14869]: W1119 18:34:02.601525   14869 reflector.go:424] storage/cacher.go:/validatingwebhookconfigurations: watch of *admissionregistration.ValidatingWebhookConfiguration ended with: very short watch: storage/cacher
Nov 19 18:34:02 k3s01 k3s[14869]: W1119 18:34:02.601930   14869 reflector.go:424] storage/cacher.go:/apiregistration.k8s.io/apiservices: watch of *apiregistration.APIService ended with: very short watch: storage/cacher.go:/apiregistration.k8
Nov 19 18:34:02 k3s01 k3s[14869]: W1119 18:34:02.602182   14869 reflector.go:424] storage/cacher.go:/mutatingwebhookconfigurations: watch of *admissionregistration.MutatingWebhookConfiguration ended with: very short watch: storage/cacher.go:
Nov 19 18:34:02 k3s01 k3s[14869]: time="2020-11-19T18:34:02.602419468+08:00" level=info msg="Proxy done" err="context canceled" url="wss://127.0.0.1:6443/v1-k3s/connect"
Nov 19 18:34:02 k3s01 k3s[14869]: W1119 18:34:02.602702   14869 reflector.go:424] storage/cacher.go:/validatingwebhookconfigurations: watch of *admissionregistration.ValidatingWebhookConfiguration ended with: very short watch: storage/cacher
Nov 19 18:34:02 k3s01 k3s[14869]: W1119 18:34:02.603153   14869 reflector.go:424] storage/cacher.go:/mutatingwebhookconfigurations: watch of *admissionregistration.MutatingWebhookConfiguration ended with: very short watch: storage/cacher.go:
Nov 19 18:34:02 k3s01 k3s[14869]: W1119 18:34:02.603572   14869 reflector.go:424] storage/cacher.go:/apiregistration.k8s.io/apiservices: watch of *apiregistration.APIService ended with: very short watch: storage/cacher.go:/apiregistration.k8
Nov 19 18:34:02 k3s01 k3s[14869]: W1119 18:34:02.603628   14869 reflector.go:424] storage/cacher.go:/events: watch of *core.Event ended with: very short watch: storage/cacher.go:/events: Unexpected watch close - watch lasted less than a seco
Nov 19 18:34:02 k3s01 k3s[14869]: time="2020-11-19T18:34:02.602751712+08:00" level=fatal msg="server stopped: http: Server closed"
Nov 19 18:34:02 k3s01 systemd[1]: k3s.service: main process exited, code=exited, status=1/FAILURE
Nov 19 18:34:02 k3s01 systemd[1]: Failed to start Lightweight Kubernetes.
Nov 19 18:34:02 k3s01 systemd[1]: Unit k3s.service entered failed state.
Nov 19 18:34:02 k3s01 systemd[1]: k3s.service failed.
wangxi83 commented 3 years ago

the same error and mitigated as #2548