cni config: No networks found

jordy25519 commented 6 years ago

Bug

Environment

Platform: bare-metal
OS: container-linux
Terraform: v0.10.7
Plugins: Provider plugin versions provider.local: version = "~> 1.0" provider.null: version = "~> 1.0" provider.template: version = "~> 1.0" provider.tls: version = "~> 1.0" terraform-provider-matchbox v0.2.2
Ref: Git SHA (if applicable)* ref=1bc25c103654a497bcc0c2486104426f09ea2456

Problem

Temporary Kubernetes control plane API fails to start

Log entries show issues relating to missing cni config

cni.go:189] Unable to update cni config: No networks found in /etc/kubernetes/cni/net.d
kubelet.go:2136] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

Desired Behavior

bootkube API server starts and cluster is provisioned

Steps to Reproduce

cluster.tf

module "bare-metal-mercury" {
  source = "git::https://github.com/poseidon/typhoon//bare-metal/container-linux/kubernetes?ref=1bc25c103654a497bcc0c2486104426f09ea2456"

  # install
  matchbox_http_endpoint  = "${var.matchbox_http_endpoint}"
  container_linux_channel = "${var.container_linux_channel}"
  container_linux_version = "${var.container_linux_version}"
  ssh_authorized_key      = "${var.ssh_authorized_key}"

  # cluster
  cluster_name    = "${var.cluster_name}"
  k8s_domain_name = "${var.k8s_domain_name}"

  # machines
  controller_names   = "${var.controller_names}"
  controller_macs    = "${var.controller_macs}"
  controller_domains = "${var.controller_domains}"
  worker_names       = "${var.worker_names}"
  worker_macs        = "${var.worker_macs}"
  worker_domains     = "${var.worker_domains}"

  # bootkube assets
  asset_dir = "${var.asset_dir}"

  # Optional
  networking                    = "${var.networking}"
  cached_install                = "${var.cached_install}"
  install_disk                  = "${var.install_disk}"
  container_linux_oem           = "${var.container_linux_oem}"
  pod_cidr                      = "${var.pod_cidr}"
  service_cidr                  = "${var.service_cidr}"
}

terraform.tfvars

matchbox_http_endpoint = "http://matchbox.example.com:8080"
matchbox_rpc_endpoint = "matchbox.example.com:8081"
ssh_authorized_key = "ssh-rsa ..."

cluster_name = "example"
container_linux_version = "1465.6.0"
container_linux_channel = "stable"

# Machines
controller_names = ["m0", "m1"]
controller_domains = ["m0.example.com", "m1.example.com"]
controller_macs = ["MAC1", "MAC2"]
worker_names = ["n0", "n1"]
worker_domains = ["n0.example.com", "n1.example.com"]
worker_macs = ["MAC1", "MAC2"]

# Bootkube
k8s_domain_name = "m0.example.com"
asset_dir = "assets_dir"

# Optional (defaults)
cached_install = "true"
install_disk = "/dev/sda"
#container_linux_oem = ""
networking = "calico"
pod_cidr = "10.2.0.0/16"
service_cidr = "10.3.0.0/16"

Run

terraform plan
terraform apply

jordy25519 commented 6 years ago

Resolved this by restarting kubelet service multiple times. Seems like synchronicity issues during the deployment process

dghubble commented 6 years ago

To provide a bit of background detail / explanation, its normal for the kubelet to log about the lack of CNI config or CNI plugins during initial bootstrapping. The flannel or calico DaemonSet (depending on your choice) has a sidecar pod called install-cni that is responsible for adding the correct version of the CNI plugins on each node (at /opt/cni/bin which is mounted in the kubelet) and the CNI config in /etc/kubernetes/cni/net.d.

Its actually handy these are decoupled because it means you can kubectl apply to update your flannel or calico to some new version and get the official upstream CNI config and plugins they intend you use.

poseidon / typhoon