jetstack / cni-migration

A CLI to migrate the CNI on a Kubernetes cluster from Canal (Calico + Flannel) to Cilium, live with no downtime.
Apache License 2.0
74 stars 13 forks source link
calico canal ciliuim cni flannel kubernetes migration

cni-migration is a CLI tool for migrating a Kubernetes cluster's CNI solution from Flannel (Canal) to Cilium. The tool works by running both CNIs at the same time using multus-cni. All pods are updated to attach a network interface from both CNIs, and then migrate each node to only running Cilium. This ensures that all pods are able to communicate to both networks at all times during the migration.

How

The following are the steps taken to migrate the CNI. During and after each step, the inter-pod communication is regularly tested using knet-stress, which will send a HTTP request to all other knet-stress instances on all nodes. This proves a bi-directional network connectivity across cluster.

  1. This step involves installing both CNIs on all nodes and labelling the nodes accordingly.
  1. This step ensures that all workloads on the cluster are running with network interfaces from both CNIs. The "sbr" Channing CNI is used to the at the default route inside each pod is Cilium, however the Pod IP remains that of the range of Flannel.
  1. This step will reverse the order of priority of CNIs, so that Cilium becomes the primary Pod IP, with an extra Flannel network interface attached.
  1. This step is iterative by performing the same operation on all nodes until they have all been migrated.
  1. After migrating all nodes, we now do a simple clean up of old resources.

The cluster should now be fully migrated from Canal to Cilium CNI.

Requirements

The following requirements apply in order to run the migration.

Firewall

Images

Configuration

The cni-migration tool has input configuration file (default --config conifg.yaml), that holds options for the migration.

labels

This holds options on which label keys and shared value should be used for each signal of steps:

  canal-cilium: node-role.kubernetes.io/canal-cilium
  cni-priority-canal: node-role.kubernetes.io/priority-canal
  cni-priority-cilium: node-role.kubernetes.io/priority-cilium
  rolled: node-role.kubernetes.io/rolled
  cilium: node-role.kubernetes.io/cilium
  migrated: node-role.kubernetes.io/migrated
  value: "true" # used as the value to each label key

paths

The file paths for each manifest bundle:

  cilium: ./resources/cilium.yaml
  multus: ./resources/multus.yaml
  knet-stress: ./resources/knet-stress.yaml

preflightResources

List of resources that must exist before beginning the migration.

  daemonsets:
    knet-stress:
    - knet-stress
    - knet-stress-2
  deployments:
  statefulsets:

watchedResources

List of resources which must be ready when checked throughout the migration before continuing:

  daemonsets:
    kube-system:
    - canal
    - cilium
    - cilium-migrated
    - kube-multus-canal
    - kube-multus-cilium
    - kube-controller-manager
    - kube-scheduler
    knet-stress:
    - knet-stress
    - knet-stress-2
  deployments:
  statefulsets:

cleanUpResources

List of resources which will be removed after completing the migration successfully:

  daemonsets:
    kube-system:
    - canal
    - cilium
    - kube-multus-canal
    - kube-multus-cilium
    knet-stress:
    - knet-stress
    - knet-stress-2
  deployments:
  statefulsets: