oracle / terraform-kubernetes-installer

Terraform Installer for Kubernetes on Oracle Cloud Infrastructure
Other
179 stars 118 forks source link

Kubernetes nodes(master and worker) NotReady #204

Closed charlesx closed 6 years ago

charlesx commented 6 years ago

Terraform Version

# Run this command to get the terraform version: $ terraform -v Terraform v0.11.7 + provider.null v1.0.0 + provider.oci (unversioned) + provider.random v1.2.0 + provider.template v1.0.0 + provider.tls v1.1.0

OCI Provider Version

# Execute the plugin directly to get the version: $ \/terraform-provider-oci ll ~/.terraform.d/plugins/terraform-provider-oci_v2.1.8 -rwxr-xr-x@ 1 xaviercharles staff 30013580 May 11 01:44 /Users/xaviercharles/.terraform.d/plugins/terraform-provider-oci_v2.1.8

Terraform Installer for Kubernetes Version

# The version/tag/release or commit hash (of this project) the issue occurred on

Input Variables

# Values of non-sensitive input variables # OCI authentication tenancy_ocid = "ocid1.tenancy.oc1..aaaaaaaagwnz325at55tjay4s3g2pxzv6bfmfnct4onhg42eb256k4riauxq" compartment_ocid = "ocid1.tenancy.oc1..aaaaaaaagwnz325at55tjay4s3g2pxzv6bfmfnct4onhg42eb256k4riauxq" fingerprint = "1a:80:e8:45:06:1c:5f:df:b5:82:c5:ab:b5:c0:65:65" private_key_path = "/Users/xaviercharles/.ssh/oci_api_key.pem" user_ocid = "ocid1.user.oc1..aaaaaaaacl4wtfyzelx27kfxu4heh6sawyejoj2zzob5j67yvl2nw5n3s6ya" region = "us-ashburn-1" # CCM user cloud_controller_user_ocid = "ocid1.user.oc1..aaaaaaaacl4wtfyzelx27kfxu4heh6sawyejoj2zzob5j67yvl2nw5n3s6ya" cloud_controller_user_fingerprint = "1a:80:e8:45:06:1c:5f:df:b5:82:c5:ab:b5:c0:65:65" cloud_controller_user_private_key_path = "/Users/xaviercharles/.ssh/oci_api_key.pem" #etcdShape = "VM.Standard1.2" #k8sMasterShape = "VM.Standard1.8" #k8sWorkerShape = "VM.Standard1.8" #etcdShape = "VM.Standard1.1" k8sMasterShape = "VM.Standard1.1" k8sWorkerShape = "VM.Standard1.1" #etcdAd1Count = "1" #etcdAd2Count = "1" #etcdAd3Count = "1" k8sMasterAd1Count = "1" #k8sMasterAd2Count = "1" #k8sMasterAd3Count = "1" k8sWorkerAd1Count = "1" #k8sWorkerAd2Count = "2" #k8sWorkerAd3Count = "2" #etcdLBShape = "100Mbps" k8sMasterLBShape = "100Mbps" #etcd_ssh_ingress = "10.0.0.0/16" etcd_ssh_ingress = "0.0.0.0/0" #etcd_cluster_ingress = "10.0.0.0/16" master_ssh_ingress = "0.0.0.0/0" worker_ssh_ingress = "0.0.0.0/0" master_https_ingress = "0.0.0.0/0" worker_nodeport_ingress = "0.0.0.0/0" #worker_nodeport_ingress = "10.0.0.0/16" control_plane_subnet_access = "public" k8s_master_lb_access = "public" #natInstanceShape = "VM.Standard1.2" #nat_instance_ad1_enabled = "true" #nat_instance_ad2_enabled = "false" #nat_instance_ad3_enabled = "true" #nat_ssh_ingress = "0.0.0.0/0" public_subnet_http_ingress = "0.0.0.0/0" public_subnet_https_ingress = "0.0.0.0/0" #worker_iscsi_volume_create is a bool not a string #worker_iscsi_volume_create = true #worker_iscsi_volume_size = 100 #worker_iscsi_volume_create is a bool not a string #worker_iscsi_volume_create = true #worker_iscsi_volume_size = 50 #etcd_iscsi_volume_create = true #etcd_iscsi_volume_size = 50

Description of issue:

I have setup values in terraform.tfvars then terraform init terraform plan terraform apply Terraform apply run fine (can see resources created in OCI) but when I try to check the cluster with the appropriate script : scripts/cluster-check.sh I get some failure. then when running so checks I got this : kubectl get nodes NAME STATUS ROLES AGE VERSION k8s-master-ad1-0.k8smasterad1.k8sbmcs.oraclevcn.com NotReady master 3m v1.9.6 k8s-worker-ad1-0.k8sworkerad1.k8sbmcs.oraclevcn.com NotReady node 1m v1.9.6

dhcp-10-175-3-240:terraform-kubernetes-installer xaviercharles$ kubectl describe node k8s-master-ad1-0.k8smasterad1.k8sbmcs.oraclevcn.com Name: k8s-master-ad1-0.k8smasterad1.k8sbmcs.oraclevcn.com Roles: master Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/hostname=k8s-master-ad1-0.k8smasterad1.k8sbmcs.oraclevcn.com node-role.kubernetes.io/master= Annotations: node.alpha.kubernetes.io/ttl=0 volumes.kubernetes.io/controller-managed-attach-detach=true CreationTimestamp: Tue, 05 Jun 2018 16:34:13 +0200 Taints: node-role.kubernetes.io/master:NoSchedule node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule Unschedulable: false Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message


OutOfDisk False Tue, 05 Jun 2018 16:41:54 +0200 Tue, 05 Jun 2018 16:34:13 +0200 KubeletHasSufficientDisk kubelet has sufficient disk space available MemoryPressure False Tue, 05 Jun 2018 16:41:54 +0200 Tue, 05 Jun 2018 16:34:13 +0200 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Tue, 05 Jun 2018 16:41:54 +0200 Tue, 05 Jun 2018 16:34:13 +0200 KubeletHasNoDiskPressure kubelet has no disk pressure Ready False Tue, 05 Jun 2018 16:41:54 +0200 Tue, 05 Jun 2018 16:34:13 +0200 KubeletNotReady runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized Addresses: Capacity: alpha.kubernetes.io/nvidia-gpu: 0 cpu: 2 memory: 6875520Ki pods: 110 Allocatable: alpha.kubernetes.io/nvidia-gpu: 0 cpu: 2 memory: 6773120Ki pods: 110 System Info: Machine ID: a05e936d35d64e2d8532ea2d61c665e3 System UUID: A05E936D-35D6-4E2D-8532-EA2D61C665E3 Boot ID: 13935a4f-87f5-45ce-a312-cb50641113c8 Kernel Version: 4.1.12-124.14.1.el7uek.x86_64 OS Image: Oracle Linux Server 7.5 Operating System: linux Architecture: amd64 Container Runtime Version: docker://17.6.2 Kubelet Version: v1.9.6 Kube-Proxy Version: v1.9.6 PodCIDR: 10.99.0.0/24 ExternalID: k8s-master-ad1-0.k8smasterad1.k8sbmcs.oraclevcn.com ProviderID: ocid1.instance.oc1.iad.abuwcljt2py3s7rxyocu56k5227p26ome6e2kyeo7656fdg2dorwwnyv6w3q Non-terminated Pods: (5 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits


kube-system kube-apiserver-k8s-master-ad1-0.k8smasterad1.k8sbmcs.oraclevcn.com 0 (0%) 0 (0%) 0 (0%) 0 (0%) kube-system kube-controller-manager-k8s-master-ad1-0.k8smasterad1.k8sbmcs.oraclevcn.com 0 (0%) 0 (0%) 0 (0%) 0 (0%) kube-system kube-proxy-k8s-master-ad1-0.k8smasterad1.k8sbmcs.oraclevcn.com 0 (0%) 0 (0%) 0 (0%) 0 (0%) kube-system kube-scheduler-k8s-master-ad1-0.k8smasterad1.k8sbmcs.oraclevcn.com 0 (0%) 0 (0%) 0 (0%) 0 (0%) kube-system oci-cloud-controller-manager-7kwlv 0 (0%) 0 (0%) 0 (0%) 0 (0%) Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) CPU Requests CPU Limits Memory Requests Memory Limits


0 (0%) 0 (0%) 0 (0%) 0 (0%) Events: Type Reason Age From Message


Normal Starting 8m kubelet, k8s-master-ad1-0.k8smasterad1.k8sbmcs.oraclevcn.com Starting kubelet. Normal NodeAllocatableEnforced 8m kubelet, k8s-master-ad1-0.k8smasterad1.k8sbmcs.oraclevcn.com Updated Node Allocatable limit across pods Normal NodeHasSufficientDisk 8m (x8 over 8m) kubelet, k8s-master-ad1-0.k8smasterad1.k8sbmcs.oraclevcn.com Node k8s-master-ad1-0.k8smasterad1.k8sbmcs.oraclevcn.com status is now: NodeHasSufficientDisk Normal NodeHasSufficientMemory 8m (x8 over 8m) kubelet, k8s-master-ad1-0.k8smasterad1.k8sbmcs.oraclevcn.com Node k8s-master-ad1-0.k8smasterad1.k8sbmcs.oraclevcn.com status is now: NodeHasSufficientMemory Normal NodeHasNoDiskPressure 8m (x7 over 8m) kubelet, k8s-master-ad1-0.k8smasterad1.k8sbmcs.oraclevcn.com Node k8s-master-ad1-0.k8smasterad1.k8sbmcs.oraclevcn.com status is now: NodeHasNoDiskPressure

Thanks for your help

owainlewis commented 6 years ago

Hi @charlesx,

It can take quite a long time for the cluster to become healthy after the terraform has completed. When the cluster terraform runs, it can take about 10-15 minutes for the cluster to come online and become healthy. Could this be the case?

jlamillan commented 6 years ago

I am pretty sure this was due to outdated yamls e.g. https://github.com/oracle/oci-cloud-controller-manager/releases/download/0.4.0/oci-cloud-controller-manager.yaml referencing the retired wcir.io.

This issue was fixed.

Please reopen if this issue persists.