rancher / rke

Rancher Kubernetes Engine (RKE), an extremely simple, lightning fast Kubernetes distribution that runs entirely within containers.
Apache License 2.0
3.22k stars 583 forks source link

incorrect version of rke-tools for etcd-rolling-snapshots #3029

Closed samene closed 2 years ago

samene commented 2 years ago

RKE version: : 1.3.0

Docker version: (docker version,docker info preferred): 1.20

Operating system and kernel: (cat /etc/os-release, uname -r preferred): CentOS 7

Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO) Bare metal

cluster.yml file:: using the rke terraform provider. This is the plan

  # module.rancher.module.rke.rke_cluster.rancher_cluster will be created
  + resource "rke_cluster" "rancher_cluster" {
      + addon_job_timeout         = 300
      + api_server_url            = (known after apply)
      + ca_crt                    = (sensitive value)
      + certificates              = (sensitive value)
      + client_cert               = (sensitive value)
      + client_key                = (sensitive value)
      + cluster_cidr              = (known after apply)
      + cluster_dns_server        = (known after apply)
      + cluster_domain            = (known after apply)
      + cluster_name              = "mycluster"
      + cluster_yaml              = (sensitive value)
      + control_plane_hosts       = (known after apply)
      + custom_certs              = false
      + dind                      = false
      + disable_port_check        = false
      + etcd_hosts                = (known after apply)
      + id                        = (known after apply)
      + inactive_hosts            = (known after apply)
      + internal_kube_config_yaml = (sensitive value)
      + kube_admin_user           = (known after apply)
      + kube_config_yaml          = (sensitive value)
      + kubernetes_version        = "v1.20.14-rancher2-1"
      + rke_cluster_yaml          = (sensitive value)
      + rke_state                 = (sensitive value)
      + running_system_images     = (known after apply)
      + ssh_agent_auth            = (known after apply)
      + update_only               = false
      + worker_hosts              = (known after apply)

      + nodes {
          + address          = "10.94.1.112"
          + internal_address = "172.16.11.67"
          + port             = "22"
          + role             = [
              + "etcd",
              + "controlplane",
              + "worker",
            ]
          + ssh_agent_auth   = (known after apply)
          + ssh_key          = (sensitive value)
          + user             = (sensitive value)
        }

      + timeouts {
          + create = "60m"
          + delete = "60m"
          + update = "60m"
        }
    }

Steps to Reproduce:

Note the version - kubernetes_version = "v1.20.14-rancher2-1" in the plan.

we are using latest data.json from here - https://releases.rancher.com/kontainer-driver-metadata/dev-v2.6/data.json

As per this file the system images for 1.20.14-rancher2-1 are

  "v1.20.14-rancher2-1": {
   "etcd": "rancher/mirrored-coreos-etcd:v3.4.15-rancher1",
   "alpine": "rancher/rke-tools:v0.1.78",
   "nginxProxy": "rancher/rke-tools:v0.1.78",
   "certDownloader": "rancher/rke-tools:v0.1.78",
   "kubernetesServicesSidecar": "rancher/rke-tools:v0.1.78",
   "kubedns": "rancher/mirrored-k8s-dns-kube-dns:1.15.10",
   "dnsmasq": "rancher/mirrored-k8s-dns-dnsmasq-nanny:1.15.10",
   "kubednsSidecar": "rancher/mirrored-k8s-dns-sidecar:1.15.10",
   "kubednsAutoscaler": "rancher/mirrored-cluster-proportional-autoscaler:1.8.1",
   "coredns": "rancher/mirrored-coredns-coredns:1.8.0",
   "corednsAutoscaler": "rancher/mirrored-cluster-proportional-autoscaler:1.8.1",
   "nodelocal": "rancher/mirrored-k8s-dns-node-cache:1.15.13",
   "kubernetes": "rancher/hyperkube:v1.20.14-rancher2",
   "flannel": "rancher/mirrored-coreos-flannel:v0.15.1",
   "flannelCni": "rancher/flannel-cni:v0.3.0-rancher6",
   "calicoNode": "rancher/mirrored-calico-node:v3.17.2",
   "calicoCni": "rancher/mirrored-calico-cni:v3.17.2",
   "calicoControllers": "rancher/mirrored-calico-kube-controllers:v3.17.2",
   "calicoCtl": "rancher/mirrored-calico-ctl:v3.17.2",
   "calicoFlexVol": "rancher/mirrored-calico-pod2daemon-flexvol:v3.17.2",
   "canalNode": "rancher/mirrored-calico-node:v3.17.2",
   "canalCni": "rancher/mirrored-calico-cni:v3.17.2",
   "canalControllers": "rancher/mirrored-calico-kube-controllers:v3.17.2",
   "canalFlannel": "rancher/mirrored-coreos-flannel:v0.15.1",
   "canalFlexVol": "rancher/mirrored-calico-pod2daemon-flexvol:v3.17.2",
   "weaveNode": "weaveworks/weave-kube:2.8.1",
   "weaveCni": "weaveworks/weave-npc:2.8.1",
   "podInfraContainer": "rancher/mirrored-pause:3.2",
   "ingress": "rancher/nginx-ingress-controller:nginx-0.49.3-rancher1",
   "ingressBackend": "rancher/mirrored-nginx-ingress-controller-defaultbackend:1.5-rancher1",
   "ingressWebhook": "rancher/mirrored-ingress-nginx-kube-webhook-certgen:v1.1.1",
   "metricsServer": "rancher/mirrored-metrics-server:v0.5.0",
   "windowsPodInfraContainer": "rancher/kubelet-pause:v0.1.6",
   "aciCniDeployContainer": "noiro/cnideploy:5.1.1.0.1ae238a",
   "aciHostContainer": "noiro/aci-containers-host:5.1.1.0.1ae238a",
   "aciOpflexContainer": "noiro/opflex:5.1.1.0.1ae238a",
   "aciMcastContainer": "noiro/opflex:5.1.1.0.1ae238a",
   "aciOvsContainer": "noiro/openvswitch:5.1.1.0.1ae238a",
   "aciControllerContainer": "noiro/aci-containers-controller:5.1.1.0.1ae238a",
   "aciGbpServerContainer": "noiro/gbp-server:5.1.1.0.1ae238a",
   "aciOpflexServerContainer": "noiro/opflex-server:5.1.1.0.1ae238a"
  },

But the installer is also needing rke-tools:v0.1.87 which is not in the mapping. This causes our airgapped installation to fail.

Extra info: This image is used for etcd-rolling-updates

docer ps -a| grep rke-tools

a8c307f24cbe   10.90.84.114:5000/rancher/rke-tools:v0.1.78                                  "/bin/bash"              2 hours ago   Created                            service-sidekick
962dcb864e27   10.90.84.114:5000/rancher/rke-tools:v0.1.87                                  "/docker-entrypoint.…"   2 hours ago   Up 2 hours                         etcd-rolling-snapshots
e6a192c5d4b3   10.90.84.114:5000/rancher/rke-tools:v0.1.78                                  "/docker-entrypoint.…"   2 hours ago   Exited (0) 2 hours ago             cluster-state-deployer
samene commented 2 years ago

This is intentional behaviour. As per this commit -

https://github.com/rancher/rke/commit/217e1b41b8fe0ffc362f75ab7bfd71e644d101f3

rke-tools is dependent on the rke version and not on the kubernetes version. so they have explicitly added code to "replace" the rke-tools image version from the kubernetes version section (which is 1.0.78 for 1.20.14-rancher2-1) with the rke-tools of the "default" kubernetes version that this version of rke supports. In our data.json (which we download in every build) this default version is now 1.24.4-rancher1-1 and so its rke-tools version is fetched which is 1.0.87.

 "RKEDefaultK8sVersions": {
  "0.3": "v1.16.3-rancher1-1",
  "default": "v1.24.4-rancher1-1"
 }