sassoftware / viya4-iac-k8s

This project contains Terraform scripts to provision cloud infrastructure resources, when using vSphere, and Ansible to apply the needed elements of a Kubernetes cluster that are required to deploy SAS Viya platform product offerings.
Apache License 2.0
23 stars 15 forks source link

feat: (IAC-1334) "kube-system/kube-vip-cloud-provider" pod is not running which prevents the external IP address allocation. #104

Closed raphaelpoumarede closed 5 months ago

raphaelpoumarede commented 5 months ago

Hello When I deploy with the bare-metal mode, the playbook execution is successful but the kube-system/kube-vip-cloud-provider pod is not running (Error then crash loop). I have the same issue with IaC 3.5.0/k8S 1.27.6 (kube-vip 0.5.5) and IAC 3.7.0/1.27.9 (kube-vip 0.5.5 and 0.5.7 tested).

panic: version string "" doesn't match expected regular expression: "^v(\d+\.\d+\.\d+)"                                                                                                                            │
│                                                                                                                                                                                                                    │
│ goroutine 1 [running]:                                                                                                                                                                                             │
│ k8s.io/component-base/metrics.parseVersion({{0x0, 0x0}, {0x0, 0x0}, {0x1f44b17, 0x0}, {0x1c97daf, 0xb}, {0x0, 0x0}, ...})                                                                                          │
│     /go/pkg/mod/k8s.io/component-base@v0.25.4/metrics/version_parser.go:47 +0x274                                                                                                                                  │
│ k8s.io/component-base/metrics.newKubeRegistry({{0x0, 0x0}, {0x0, 0x0}, {0x1f44b17, 0x0}, {0x1c97daf, 0xb}, {0x0, 0x0}, ...})                                                                                       │
│     /go/pkg/mod/k8s.io/component-base@v0.25.4/metrics/registry.go:320 +0x119                                                                                                                                       │
│ k8s.io/component-base/metrics.NewKubeRegistry()                                                                                                                                                                    │
│     /go/pkg/mod/k8s.io/component-base@v0.25.4/metrics/registry.go:335 +0x78                                                                                                                                        │
│ k8s.io/component-base/metrics/legacyregistry.init()                                                                                                                                                                │
│     /go/pkg/mod/k8s.io/component-base@v0.25.4/metrics/legacyregistry/registry.go:29 +0x1d                                                                                                                          │
│ Stream closed EOF for kube-system/kube-vip-cloud-provider-578d9b7bf7-z6t4f (kube-vip-cloud-provider) 

Then my ingress service external IP allocation is <pending> but I suppose it is a consequence of the issue with the kube-vip cloud provider. Any help would be very appreciated. Thanks !

PS : See bellow my ansible-vars.yaml file :

# Ansible items
ansible_user     : "cloud-user"
#ansible_password : "lnxsas"

# VM items
vm_os   : "ubuntu" # Choices : [ubuntu|rhel] - Ubuntu 20.04 LTS / RHEL ???
vm_arch : "amd64"  # Choices : [amd64] - 64-bit OS / ???

# System items
enable_cgroup_v2    : true     # TODO - If needed hookup or remove flag
system_ssh_keys_dir : "~/.ssh" # Directory holding public keys to be used on each system

# Generic items
prefix : "GEL-k8s"
deployment_type: "bare_metal" # Values are: [bare_metal|vsphere]

# Kubernetes - Common
#
# TODO: kubernetes_upgrade_allowed needs to be implemented to either
#       add or remove locks on the kubeadm, kubelet, kubectl packages
#
kubernetes_cluster_name    : "{{ prefix }}-oss" # NOTE: only change the prefix value above
#kubernetes_version         : "1.23.8" 
#kubernetes_version         : "1.24.10"
#kubernetes_version          : "1.25.8"
#kubernetes_version          : "1.26.6" https://kubernetes.io/releases/
kubernetes_version          : "1.27.6"

kubernetes_upgrade_allowed : true
kubernetes_arch            : "{{ vm_arch }}"
kubernetes_cni             : "calico"        # Choices : [calico]
kubernetes_cni_version     : "3.24.4"
kubernetes_cri             : "containerd"    # Choices : [containerd|docker|cri-o] NOTE: cri-o is not currently functional
kubernetes_service_subnet  : "10.42.0.0/16" # default values 
kubernetes_pod_subnet      : "10.43.0.0/16" # default values

# Kubernetes - VIP : https://kube-vip.io
# 
# Useful links:
#
#   VIP IP : https://kube-vip.chipzoller.dev/docs/installation/static/
#   VIP Cloud Provider IP Range : https://kube-vip.chipzoller.dev/docs/usage/cloud-provider/#the-kube-vip-cloud-provider-configmap
#
kubernetes_loadbalancer             : "kube_vip"
kubernetes_vip_version              : "0.5.5"
# we need to create static VIPs (eth0) - needs to run some commands to create/find the VIP IP in the network + register in DNS
# mandatory even for 1 control plan node
kubernetes_vip_interface            : "eth0"
kubernetes_vip_ip                   : "10.96.18.1" # for RACE EXNET pick a value in the "10.96.18.0+" unused range 
kubernetes_vip_fqdn                 : "osk-api-stud0.gelenable.sas.com" # DNS alias associated to the K8s CP VIP (names)
kubernetes_loadbalancer_addresses :
  - "range-global: 10.96.18.2-10.96.18.4" # IP range  for services type that require the LB IP access, range-<namespace>

# Kubernetes - Control Plane
control_plane_ssh_key_name : "cp_ssh"

# Labels/Taints , we associate label and taints to the K8s nodes 
# Note : here "hostname" command is used behind the scene. It does not necessarily correspond to the names used in the inventory

## Labels
node_labels:
  sasnode02:
    - kubernetes.azure.com/mode=system
  sasnode03:
    - kubernetes.azure.com/mode=system
  sasnode04:
    - kubernetes.azure.com/mode=system
  sasnode05:
    - workload.sas.com/class=cas
  sasnode06:
    - workload.sas.com/class=stateful
  sasnode07:
    - workload.sas.com/class=stateless
  sasnode08:
    - launcher.sas.com/prepullImage=sas-programming-environment
    - workload.sas.com/class=compute

## Taints
node_taints:
  sasnode05:
    - workload.sas.com/class=cas:NoSchedule

# Jump Server
jump_ip : rext03-0200.race.sas.com

# NFS Server
nfs_ip  : rext03-0175.race.sas.com
jarpat commented 5 months ago

A follow up for this issue for other users.

@raphaelpoumarede reported this issue https://github.com/kube-vip//issues/95, in the kube-vip-cloud-provider Github project and it turned out that there was an issue with v0.0.9 of the kube-vip-cloud-provider binary. The kube-vip-cloud-provider team has since reverted that change, https://github.com/kube-vip/kube-vip-cloud-provider/commit/3b3a4a4afbd92b18bfd5d468fa300d44ec01b450

On our end the kubernetes install should work again. We are going to update the task where we apply the kube-vip-cloud-controller.yaml manifest so it's no longer sourced directly from the main branch in the kube-vip-cloud-provider repo, but rather a specific tag. An internal Jira ticket has been created to track this work.