smart-edge-open / converged-edge-experience-kits

Source code for experience kits with Ansible-based deployment.
Apache License 2.0
37 stars 40 forks source link

Edge-node deployment: Issue at while running join cluster in openness 20.06.01 #64

Closed Jaladi-Devika closed 3 years ago

Jaladi-Devika commented 4 years ago

Hi,

While deploying edge-node(sh deploy_ne.sh nodes) failing at below step. Can you please help me to resolve this?

TASK [kubernetes/worker : join the cluster] ** task path: /home/sysadmin/Devika/openness/openness_20_06_01/openness-experience-kits/roles/kubernetes/worker/tasks/main.yml:39 fatal: [node01]: FAILED! => { "changed": true, "cmd": [ "kubeadm", "join", "192.168.10.91:6443", "--token", "hbkw5a.3sh15d4v11u93n2x", "--discovery-token-ca-cert-hash", "sha256:b722978135b66f3e0bd92f1a17044116c030f5ecf5a40fdd5c1143fcb8b35abe", "--v=2" ], "delta": "0:00:00.446995", "end": "2020-09-17 18:15:54.716672", "rc": 1, "start": "2020-09-17 18:15:54.269677" }

STDOUT:

[preflight] Running pre-flight checks [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'

STDERR:

W0917 18:15:54.338674 27359 join.go:346] [preflight] WARNING: JoinControlPane.controlPlane settings will be ignored when control-plane flag is not set. I0917 18:15:54.338790 27359 join.go:371] [preflight] found NodeName empty; using OS hostname as NodeName I0917 18:15:54.338827 27359 initconfiguration.go:103] detected and using CRI socket: /var/run/dockershim.sock I0917 18:15:54.338885 27359 preflight.go:90] [preflight] Running general checks I0917 18:15:54.339121 27359 checks.go:249] validating the existence and emptiness of directory /etc/kubernetes/manifests I0917 18:15:54.339133 27359 checks.go:286] validating the existence of file /etc/kubernetes/kubelet.conf I0917 18:15:54.339142 27359 checks.go:286] validating the existence of file /etc/kubernetes/bootstrap-kubelet.conf I0917 18:15:54.339151 27359 checks.go:102] validating the container runtime I0917 18:15:54.421970 27359 checks.go:128] validating if the service is enabled and active I0917 18:15:54.535699 27359 checks.go:335] validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables I0917 18:15:54.535740 27359 checks.go:335] validating the contents of file /proc/sys/net/ipv4/ip_forward I0917 18:15:54.535761 27359 checks.go:649] validating whether swap is enabled or not I0917 18:15:54.535783 27359 checks.go:376] validating the presence of executable conntrack I0917 18:15:54.536018 27359 checks.go:376] validating the presence of executable ip I0917 18:15:54.536137 27359 checks.go:376] validating the presence of executable iptables I0917 18:15:54.536151 27359 checks.go:376] validating the presence of executable mount I0917 18:15:54.536246 27359 checks.go:376] validating the presence of executable nsenter I0917 18:15:54.536264 27359 checks.go:376] validating the presence of executable ebtables I0917 18:15:54.536274 27359 checks.go:376] validating the presence of executable ethtool I0917 18:15:54.536284 27359 checks.go:376] validating the presence of executable socat I0917 18:15:54.536295 27359 checks.go:376] validating the presence of executable tc I0917 18:15:54.536304 27359 checks.go:376] validating the presence of executable touch I0917 18:15:54.536322 27359 checks.go:520] running all checks I0917 18:15:54.626142 27359 checks.go:406] checking whether the given node name is reachable using net.LookupHost I0917 18:15:54.626288 27359 checks.go:618] validating kubelet version I0917 18:15:54.671267 27359 checks.go:128] validating if the service is enabled and active I0917 18:15:54.678175 27359 checks.go:201] validating availability of port 10250 I0917 18:15:54.679493 27359 checks.go:286] validating the existence of file /etc/kubernetes/pki/ca.crt I0917 18:15:54.679505 27359 checks.go:432] validating if the connectivity type is via proxy or direct I0917 18:15:54.679535 27359 join.go:441] [preflight] Discovering cluster-info I0917 18:15:54.679562 27359 token.go:78] [discovery] Created cluster-info discovery client, requesting info from "192.168.10.91:6443" I0917 18:15:54.687385 27359 token.go:116] [discovery] Requesting info from "192.168.10.91:6443" again to validate TLS against the pinned public key I0917 18:15:54.693495 27359 token.go:133] [discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "192.168.10.91:6443" I0917 18:15:54.693508 27359 discovery.go:51] [discovery] Using provided TLSBootstrapToken as authentication credentials for the join process I0917 18:15:54.693514 27359 join.go:455] [preflight] Fetching init configuration I0917 18:15:54.693518 27359 join.go:493] [preflight] Retrieving KubeConfig objects I0917 18:15:54.711364 27359 preflight.go:101] [preflight] Running configuration dependant checks I0917 18:15:54.711380 27359 controlplaneprepare.go:211] [download-certs] Skipping certs download I0917 18:15:54.711389 27359 kubelet.go:111] [kubelet-start] writing bootstrap kubelet config file at /etc/kubernetes/bootstrap-kubelet.conf I0917 18:15:54.712842 27359 kubelet.go:119] [kubelet-start] writing CA certificate at /etc/kubernetes/pki/ca.crt I0917 18:15:54.713948 27359 kubelet.go:145] [kubelet-start] Checking for an existing Node in the cluster with name "node01" and status "Ready" error execution phase kubelet-start: cannot get Node "node01": nodes "node01" is forbidden: User "system:bootstrap:hbkw5a" cannot get resource "nodes" in API group "" at the cluster scope To see the stack trace of this error execute with --v=5 or higher

MSG:

non-zero return code

PLAY RECAP *** node01 : ok=173 changed=42 unreachable=0 failed=1 skipped=72 rescued=0 ignored=5

[root@controller openness-experience-kits]#

Thanks, Devika

Jaladi-Devika commented 4 years ago

Can anyone please help me on this?

Thanks & Regards, Devika

tomaszwesolowski commented 4 years ago

Hi @Jaladi-Devika Our team is looking at this error, we'll get back to you soon

tomaszwesolowski commented 4 years ago

Hi @Jaladi-Devika The machines you used for deployment were clean and fresh install of centos7? We noticed that this error can occur if there was already deployed kubernetes cluster

Jaladi-Devika commented 4 years ago

Hi @tomaszwesolowski ,

Worker-node is clean and fresh we never ran deploy_ne.sh nodes after installing centos7.6. I tried running clean_ne.sh script multiple times as well. Can you please suggest how to clear this error?

Thanks & Regards,

tomaszwesolowski commented 4 years ago

Are you able to connect from one machine to another (node to controller)?

Jaladi-Devika commented 4 years ago

@tomaszwesolowski ,

Yes am able to do ssh from node to controller and controller to node both.

Jaladi-Devika commented 4 years ago

@tomaszwesolowski ,

Is there any update? can you please update on this issue? Its blocking our activity.

Thanks & Regards, Devika

kamilpoleszczuk commented 4 years ago

Hi @Jaladi-Devika,

Jaladi-Devika commented 4 years ago

Hi @kamilpoleszczuk ,

I Have deployed cluster from openness 20.06.01 and I didn't install k8s maually. Will try deploying nodes with --v=5

Jaladi-Devika commented 4 years ago

Hi @kamilpoleszczuk ,

Tried deploying node with --v=5 but still failed.

TASK [kubernetes/worker : join the cluster] ** task path: /home/sysadmin/Devika/openness/openness_20_06_01/openness-experience-kits/roles/kubernetes/worker/tasks/main.yml:39 fatal: [node01]: FAILED! => { "changed": true, "cmd": [ "kubeadm", "join", "192.168.10.91:6443", "--token", "w617sc.43ay3zl2m4nninao", "--discovery-token-ca-cert-hash", "sha256:b722978135b66f3e0bd92f1a17044116c030f5ecf5a40fdd5c1143fcb8b35abe", "--v=2" ], "delta": "0:00:00.464564", "end": "2020-09-29 18:22:17.655099", "rc": 1, "start": "2020-09-29 18:22:17.190535" }

STDOUT:

[preflight] Running pre-flight checks [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'

STDERR:

W0929 18:22:17.221318 14827 join.go:346] [preflight] WARNING: JoinControlPane.controlPlane settings will be ignored when control-plane flag is not set. I0929 18:22:17.221375 14827 join.go:371] [preflight] found NodeName empty; using OS hostname as NodeName I0929 18:22:17.221493 14827 initconfiguration.go:103] detected and using CRI socket: /var/run/dockershim.sock I0929 18:22:17.221548 14827 preflight.go:90] [preflight] Running general checks I0929 18:22:17.221595 14827 checks.go:249] validating the existence and emptiness of directory /etc/kubernetes/manifests I0929 18:22:17.221606 14827 checks.go:286] validating the existence of file /etc/kubernetes/kubelet.conf I0929 18:22:17.221611 14827 checks.go:286] validating the existence of file /etc/kubernetes/bootstrap-kubelet.conf I0929 18:22:17.221618 14827 checks.go:102] validating the container runtime I0929 18:22:17.307511 14827 checks.go:128] validating if the service is enabled and active I0929 18:22:17.393573 14827 checks.go:335] validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables I0929 18:22:17.393644 14827 checks.go:335] validating the contents of file /proc/sys/net/ipv4/ip_forward I0929 18:22:17.393666 14827 checks.go:649] validating whether swap is enabled or not I0929 18:22:17.393691 14827 checks.go:376] validating the presence of executable conntrack I0929 18:22:17.393964 14827 checks.go:376] validating the presence of executable ip I0929 18:22:17.394115 14827 checks.go:376] validating the presence of executable iptables I0929 18:22:17.394306 14827 checks.go:376] validating the presence of executable mount I0929 18:22:17.394409 14827 checks.go:376] validating the presence of executable nsenter I0929 18:22:17.394424 14827 checks.go:376] validating the presence of executable ebtables I0929 18:22:17.394434 14827 checks.go:376] validating the presence of executable ethtool I0929 18:22:17.394444 14827 checks.go:376] validating the presence of executable socat I0929 18:22:17.394456 14827 checks.go:376] validating the presence of executable tc I0929 18:22:17.394465 14827 checks.go:376] validating the presence of executable touch I0929 18:22:17.394483 14827 checks.go:520] running all checks I0929 18:22:17.493330 14827 checks.go:406] checking whether the given node name is reachable using net.LookupHost I0929 18:22:17.493469 14827 checks.go:618] validating kubelet version I0929 18:22:17.565367 14827 checks.go:128] validating if the service is enabled and active I0929 18:22:17.575277 14827 checks.go:201] validating availability of port 10250 I0929 18:22:17.575922 14827 checks.go:286] validating the existence of file /etc/kubernetes/pki/ca.crt I0929 18:22:17.575934 14827 checks.go:432] validating if the connectivity type is via proxy or direct I0929 18:22:17.575966 14827 join.go:441] [preflight] Discovering cluster-info I0929 18:22:17.575990 14827 token.go:78] [discovery] Created cluster-info discovery client, requesting info from "192.168.10.91:6443" I0929 18:22:17.601475 14827 token.go:116] [discovery] Requesting info from "192.168.10.91:6443" again to validate TLS against the pinned public key I0929 18:22:17.636075 14827 token.go:133] [discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "192.168.10.91:6443" I0929 18:22:17.636098 14827 discovery.go:51] [discovery] Using provided TLSBootstrapToken as authentication credentials for the join process I0929 18:22:17.636106 14827 join.go:455] [preflight] Fetching init configuration I0929 18:22:17.636110 14827 join.go:493] [preflight] Retrieving KubeConfig objects I0929 18:22:17.646727 14827 preflight.go:101] [preflight] Running configuration dependant checks I0929 18:22:17.646749 14827 controlplaneprepare.go:211] [download-certs] Skipping certs download I0929 18:22:17.646764 14827 kubelet.go:111] [kubelet-start] writing bootstrap kubelet config file at /etc/kubernetes/bootstrap-kubelet.conf I0929 18:22:17.649858 14827 kubelet.go:119] [kubelet-start] writing CA certificate at /etc/kubernetes/pki/ca.crt I0929 18:22:17.652126 14827 kubelet.go:145] [kubelet-start] Checking for an existing Node in the cluster with name "node01" and status "Ready" error execution phase kubelet-start: cannot get Node "node01": nodes "node01" is forbidden: User "system:bootstrap:w617sc" cannot get resource "nodes" in API group "" at the cluster scope To see the stack trace of this error execute with --v=5 or higher

MSG:

non-zero return code

PLAY RECAP *** node01 : ok=173 changed=42 unreachable=0 failed=1 skipped=72 rescued=0 ignored=4

[root@controller openness-experience-kits]#

kamilpoleszczuk commented 4 years ago

Hi @Jaladi-Devika, from logs attached: "kubeadm", "join", "192.168.10.91:6443", "--token", "w617sc.43ay3zl2m4nninao", "--discovery-token-ca-cert-hash", "sha256:b722978135b66f3e0bd92f1a17044116c030f5ecf5a40fdd5c1143fcb8b35abe", "--v=2" so --v=5 was not applied.

Could you please provide output from kubectl describe node of your controller node?

Jaladi-Devika commented 4 years ago

hi @kamilpoleszczuk ,

here is the output of "kubectl describe node controller"

[root@controller openness-experience-kits]# kubectl describe node controller Name: controller Roles: master Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kube-ovn/role=master kubernetes.io/arch=amd64 kubernetes.io/hostname=controller kubernetes.io/os=linux node-role.kubernetes.io/master= Annotations: node.alpha.kubernetes.io/ttl: 0 ovn.kubernetes.io/cidr: 100.64.0.0/16 ovn.kubernetes.io/gateway: 100.64.0.1 ovn.kubernetes.io/ip_address: 100.64.0.2 ovn.kubernetes.io/logical_switch: join ovn.kubernetes.io/mac_address: a6:b9:40:40:00:03 ovn.kubernetes.io/port_name: node-controller CreationTimestamp: Tue, 08 Sep 2020 17:03:00 +0530 Taints: node-role.kubernetes.io/master:NoSchedule Unschedulable: false Lease: HolderIdentity: controller AcquireTime: RenewTime: Tue, 29 Sep 2020 19:59:24 +0530 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message


NetworkUnavailable False Thu, 17 Sep 2020 13:28:50 +0530 Thu, 17 Sep 2020 13:28:50 +0530 FlannelIsUp Flannel is running on this node MemoryPressure False Tue, 29 Sep 2020 19:56:16 +0530 Tue, 08 Sep 2020 17:02:55 +0530 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Tue, 29 Sep 2020 19:56:16 +0530 Tue, 08 Sep 2020 17:02:55 +0530 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Tue, 29 Sep 2020 19:56:16 +0530 Tue, 08 Sep 2020 17:02:55 +0530 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Tue, 29 Sep 2020 19:56:16 +0530 Thu, 17 Sep 2020 13:25:12 +0530 KubeletReady kubelet is posting ready status Addresses: InternalIP: 192.168.10.91 Hostname: controller Capacity: cpu: 4 ephemeral-storage: 256710176Ki hugepages-1Gi: 0 hugepages-2Mi: 2Gi memory: 16138772Ki pods: 110 Allocatable: cpu: 4 ephemeral-storage: 236584097810 hugepages-1Gi: 0 hugepages-2Mi: 2Gi memory: 13939220Ki pods: 110 System Info: Machine ID: e7e0d38b4e154d76aef4cebf17de6dc4 System UUID: 326605B6-8F27-11E4-A70B-CC493E523800 Boot ID: fb9cce7d-911d-49ab-9889-8830632ac0e7 Kernel Version: 3.10.0-957.el7.x86_64 OS Image: CentOS Linux 7 (Core) Operating System: linux Architecture: amd64 Container Runtime Version: docker://19.3.9 Kubelet Version: v1.18.4 Kube-Proxy Version: v1.18.4 PodCIDR: 10.244.0.0/24 PodCIDRs: 10.244.0.0/24 Non-terminated Pods: (20 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE


kube-system coredns-6955765f44-b9zfl 100m (2%) 0 (0%) 70Mi (0%) 170Mi (1%) 21d kube-system coredns-6955765f44-n9nrv 100m (2%) 0 (0%) 70Mi (0%) 170Mi (1%) 21d kube-system etcd-controller 0 (0%) 0 (0%) 0 (0%) 0 (0%) 12d kube-system kube-apiserver-controller 250m (6%) 0 (0%) 0 (0%) 0 (0%) 12d kube-system kube-controller-manager-controller 200m (5%) 0 (0%) 0 (0%) 0 (0%) 12d kube-system kube-flannel-ds-amd64-psstr 100m (2%) 100m (2%) 50Mi (0%) 50Mi (0%) 12d kube-system kube-multus-ds-amd64-h4t4b 100m (2%) 100m (2%) 50Mi (0%) 50Mi (0%) 12d kube-system kube-ovn-cni-tgxtm 0 (0%) 0 (0%) 0 (0%) 0 (0%) 12d kube-system kube-ovn-controller-64b46fd7d4-dj2ld 0 (0%) 0 (0%) 0 (0%) 0 (0%) 12d kube-system kube-proxy-qvtdx 0 (0%) 0 (0%) 0 (0%) 0 (0%) 21d kube-system kube-scheduler-controller 100m (2%) 0 (0%) 0 (0%) 0 (0%) 12d kube-system ovn-central-7dd57dff9-tdkdm 500m (12%) 0 (0%) 300Mi (2%) 0 (0%) 12d kube-system ovs-ovn-2v4wd 200m (5%) 1 (25%) 1Gi (7%) 1Gi (7%) 12d openness docker-registry-deployment-5776d7c7c5-ttblr 0 (0%) 0 (0%) 0 (0%) 0 (0%) 12d openness nfd-release-node-feature-discovery-master-58d4b46578-9q94p 0 (0%) 0 (0%) 0 (0%) 0 (0%) 12d openness syslog-master-254z8 100m (2%) 500m (12%) 128Mi (0%) 128Mi (0%) 12d telemetry custom-metrics-apiserver-fb988c8c7-tr4zm 0 (0%) 0 (0%) 0 (0%) 0 (0%) 12d telemetry grafana-6b4f99684-b9bss 0 (0%) 0 (0%) 0 (0%) 0 (0%) 12d telemetry prometheus-server-6f6f89c9ff-zmh7k 0 (0%) 0 (0%) 0 (0%) 0 (0%) 12d telemetry telemetry-aware-scheduling-75596fd6b4-tfg58 0 (0%) 0 (0%) 0 (0%) 0 (0%) 12d Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits


cpu 1750m (43%) 1700m (42%) memory 1692Mi (12%) 1592Mi (11%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 1Gi (50%) 1Gi (50%) Events: [root@controller openness-experience-kits]#

Jaladi-Devika commented 3 years ago

hi @kamilpoleszczuk, Any update on this? Any clue what is the issue?

THanks & Regards, Devika

kamilpoleszczuk commented 3 years ago

Hi @Jaladi-Devika, As error execution phase kubelet-start: cannot get Node "node01": nodes "node01" is forbidden: User "system:bootstrap:w617sc" cannot get resource "nodes" in API group "" at the cluster scope To see the stack trace of this error execute with --v=5 or higher is k8s error, --v=5 could give some useful information. Is there any specific reason why controller was set up separately and then ./deploy_ne.sh nodes was executed?

Jaladi-Devika commented 3 years ago

HI @kamilpoleszczuk ,

No specific reason, actually every time we used to deploy like this only 1st controller and then edge-node.

jakubrym commented 3 years ago

Hi @Jaladi-Devika, does the problem still appear? If not, do you agree to close this ticket, and reopen it if need be?

jakubrym commented 3 years ago

@Jaladi-Devika Closing ticket. Please let me know, when the problem is visible again.