rancher / terraform-provider-rke

Terraform provider plugin for deploy kubernetes cluster by RKE(Rancher Kubernetes Engine)
Mozilla Public License 2.0
340 stars 151 forks source link

Calico kube controller image is still being pulled from internet when specified in system_images from different registry #292

Open vojtechmares opened 3 years ago

vojtechmares commented 3 years ago

Hello,

I am running a RKE cluster in air gapped environment.

Images are provided from internal registry (GitLab).

Every image is working just fine except calico_controllers which RKE tries to pull from public internet and not use provided image in system_images

My cluster definition:

resource "rke_cluster" "dev" {
  cluster_name   = "dev"
  ssh_key_path = "/root/.ssh/id_rsa"

  nodes {
    address           = "__REDACTED__"
    hostname_override = "master-1"
    user              = "root"
    role              = ["controlplane", "etcd"]
  }
  nodes {
    address           = "__REDACTED__"
    hostname_override = "master-2"
    user              = "root"
    role              = ["controlplane", "etcd"]
  }
  nodes {
    address           = "__REDACTED__"
    hostname_override = "master-3"
    user              = "root"
    role              = ["controlplane", "etcd"]
  }
  nodes {
    address           = "__REDACTED__"
    hostname_override = "worker-1"
    user              = "root"
    role              = ["worker"]
  }
  nodes {
    address           = "__REDACTED__"
    hostname_override = "worker-2"
    user              = "root"
    role              = ["worker"]
  }
  nodes {
    address           = "__REDACTED__"
    hostname_override = "worker-3"
    user              = "root"
    role              = ["worker"]
  }

  ingress {
    provider = "none"
  }

  network {
    plugin = "canal"
  }

  services {
    kube_api {
      audit_log {
        enabled = true

        configuration {
          path = "/var/log/kube-audit/audit-log.json"
        }
      }
    }
  }

  system_images {
    etcd                        = "__PRIVATE_REGISTRY__/rancher/coreos-etcd:v3.4.14-rancher1"
    alpine                      = "__PRIVATE_REGISTRY__/rancher/rke-tools:v0.1.72"
    nginx_proxy                 = "__PRIVATE_REGISTRY__/rancher/rke-tools:v0.1.72"
    cert_downloader             = "__PRIVATE_REGISTRY__/rancher/rke-tools:v0.1.72"
    kubernetes_services_sidecar = "__PRIVATE_REGISTRY__/rancher/rke-tools:v0.1.72"
    kube_dns                    = "__PRIVATE_REGISTRY__/rancher/k8s-dns-kube-dns:1.15.10"
    dnsmasq                     = "__PRIVATE_REGISTRY__/rancher/k8s-dns-dnsmasq-nanny:1.15.10"
    kube_dns_sidecar            = "__PRIVATE_REGISTRY__/rancher/k8s-dns-sidecar:1.15.10"
    kube_dns_autoscaler         = "__PRIVATE_REGISTRY__/rancher/cluster-proportional-autoscaler:1.8.1"
    coredns                     = "__PRIVATE_REGISTRY__/rancher/coredns-coredns:1.8.0"
    coredns_autoscaler          = "__PRIVATE_REGISTRY__rancher/cluster-proportional-autoscaler:1.8.1"
    nodelocal                   = "__PRIVATE_REGISTRY__/rancher/k8s-dns-node-cache:1.15.13"
    kubernetes                  = "__PRIVATE_REGISTRY__/rancher/hyperkube:v1.20.4-rancher1"
    flannel                     = "__PRIVATE_REGISTRY__/rancher/coreos-flannel:v0.13.0-rancher1"
    flannel_cni                 = "__PRIVATE_REGISTRY__/rancher/flannel-cni:v0.3.0-rancher6"
    calico_node                 = "__PRIVATE_REGISTRY__/rancher/calico-node:v3.17.2"
    calico_cni                  = "__PRIVATE_REGISTRY__/rancher/calico-cni:v3.17.2"
    calico_controllers          = "__PRIVATE_REGISTRY__/rancher/calico-kube-controllers:v3.17.2"
    calico_ctl                  = "__PRIVATE_REGISTRY__/rancher/calico-ctl:v3.17.2"
    calico_flex_vol             = "__PRIVATE_REGISTRY__/rancher/calico-pod2daemon-flexvol:v3.17.2"
    canal_node                  = "__PRIVATE_REGISTRY__/rancher/calico-node:v3.17.2"
    canal_cni                   = "__PRIVATE_REGISTRY__/rancher/calico-cni:v3.17.2"
    canal_flannel               = "__PRIVATE_REGISTRY__/rancher/coreos-flannel:v0.13.0-rancher1"
    canal_flex_vol              = "__PRIVATE_REGISTRY__/rancher/calico-pod2daemon-flexvol:v3.17.2"
    weave_node                  = "__PRIVATE_REGISTRY__/weaveworks/weave-kube:2.8.1"
    weave_cni                   = "__PRIVATE_REGISTRY__/weaveworks/weave-npc:2.8.1"
    pod_infra_container         = "__PRIVATE_REGISTRY__/rancher/pause:3.2"
    ingress                     = "__PRIVATE_REGISTRY__/rancher/nginx-ingress-controller:nginx-0.43.0-rancher1"
    ingress_backend             = "__PRIVATE_REGISTRY__/rancher/nginx-ingress-controller-defaultbackend:1.5-rancher1"
    metrics_server              = "__PRIVATE_REGISTRY__/rancher/metrics-server:v0.4.1"
    aci_cni_deploy_container    = "__PRIVATE_REGISTRY__/noiro/cnideploy:5.1.1.0.1ae238a"
    aci_host_container          = "__PRIVATE_REGISTRY__/noiro/aci-containers-host:5.1.1.0.1ae238a"
    aci_opflex_container        = "__PRIVATE_REGISTRY__/noiro/opflex:5.1.1.0.1ae238a"
    aci_mcast_container         = "__PRIVATE_REGISTRY__/noiro/opflex:5.1.1.0.1ae238a"
    aci_controller_container    = "__PRIVATE_REGISTRY__/aci-containers-controller:5.1.1.0.1ae238a"
  }

}

kubectl

$ kubectl get po -n kube-system
NAME                                       READY   STATUS             RESTARTS   AGE
calico-kube-controllers-6c8ddcb6cd-nk6wf   0/1     ImagePullBackOff   0          76m
canal-8ht5t                                2/2     Running            0          76m
canal-cf24b                                2/2     Running            0          76m
canal-qhzrg                                2/2     Running            0          76m
canal-v22dk                                2/2     Running            0          76m
canal-vx8sj                                2/2     Running            0          76m
canal-xg648                                2/2     Running            0          76m
coredns-5ffc49c57b-5knv5                   1/1     Running            0          76m
coredns-5ffc49c57b-kvcll                   1/1     Running            0          76m
coredns-autoscaler-6c84d979b7-s5hpl        1/1     Running            0          76m
metrics-server-55b956b5cb-dhn2s            1/1     Running            0          76m
rke-coredns-addon-deploy-job-pzscl         0/1     Completed          0          76m
rke-metrics-addon-deploy-job-4f84q         0/1     Completed          0          76m
rke-network-plugin-deploy-job-cv7kz        0/1     Completed          0          76m
$ kubectl describe po -n kube-system calico-kube-controllers-6c8ddcb6cd-nk6wf
Name:                 calico-kube-controllers-6c8ddcb6cd-nk6wf
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 worker-3/172.25.192.77
Start Time:           Thu, 08 Apr 2021 12:15:48 +0200
Labels:               k8s-app=calico-kube-controllers
                      pod-template-hash=6c8ddcb6cd
Annotations:          cni.projectcalico.org/podIP: 10.42.3.2/32
                      cni.projectcalico.org/podIPs: 10.42.3.2/32
Status:               Pending
IP:                   10.42.3.2
IPs:
  IP:           10.42.3.2
Controlled By:  ReplicaSet/calico-kube-controllers-6c8ddcb6cd
Containers:
  calico-kube-controllers:
    Container ID:
    Image:          rancher/calico-kube-controllers:v3.17.2
    Image ID:
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       ImagePullBackOff
    Ready:          False
    Restart Count:  0
    Readiness:      exec [/usr/bin/check-status -r] delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      ENABLED_CONTROLLERS:  node
      DATASTORE_TYPE:       kubernetes
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from calico-kube-controllers-token-jlqnr (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  calico-kube-controllers-token-jlqnr:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  calico-kube-controllers-token-jlqnr
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  kubernetes.io/os=linux
Tolerations:     :NoScheduleop=Exists
                 :NoExecuteop=Exists
                 CriticalAddonsOnly op=Exists
Events:
  Type     Reason   Age                  From     Message
  ----     ------   ----                 ----     -------
  Warning  Failed   16m (x261 over 76m)  kubelet  Error: ImagePullBackOff
  Normal   BackOff  92s (x325 over 76m)  kubelet  Back-off pulling image "rancher/calico-kube-controllers:v3.17.2"

I don't see a typo in the name of the image or typo in calico_controllers key.

Image is present in the docker registry

Why is still rke pulling the image from the public internet?

vojtechmares commented 3 years ago

Current workaround via Makefile:

patch-calico:
    kubectl set -n kube-system image deployment/calico-kube-controllers calico-kube-controllers=__PRIVATE_REGISTRY__/rancher/calico-kube-controllers:v3.17.2
rawmind0 commented 3 years ago

Hello @vojtechmares , what tf provider version are you using?? Have you tried defining default private_registry on your rke cluster??