squat / kilo

Kilo is a multi-cloud network overlay built on WireGuard and designed for Kubernetes (k8s + wg = kg)
https://kilo.squat.ai
Apache License 2.0
2.01k stars 120 forks source link

Cilium compatibility #81

Open anjmao opened 3 years ago

anjmao commented 3 years ago

Hi, I just tried kilo with k3s and flannel. It looks great. At CAST AI we are developing managed multi-cloud k8s solution. Currently we use cloud provider's VPN gateways and Cilium with VXLAN encapsulation, but I'm looking for adapting WireGuard as an alternative instead of VPN Gateways since there are multiple problems. For example, VPN creation for azure-gcp or azure-aws takes 30 minutes which is absolutely ridiculous.

I would like to contribute by adding Cilium compatibility if that makes sense for this project, let me know.

squat commented 3 years ago

Hi @anjmao, Support for cilium would be amazing. I've had this on my own ToDo list for a long time but have other higher priority work in the project, e.g. documentation, dual stack etc.

The basic goal we have would be to implement https://github.com/squat/kilo/blob/master/pkg/encapsulation/flannel.go but for Cilium. I understand that Cilium controls the IPs that are allowed to access the container networking interfaces pretty tightly, so there may be some additional work that needs to be done in the Init method, such as registering some routes/addresses in Cilium's etcd or CRDs. I'm not very familiar with Cilium myself so I would need to do some more research before being able to offer more specific advice. Do you have a rough idea of the complexity of interfacing with Cilium?

Let me know if you'd like to chat on the K8s slack sometime.

anjmao commented 3 years ago

Cool. My initial idea to follow flannel implementation as you mentioned. I would spent more time this week and write an update here with my findings. Thanks.

anjmao commented 3 years ago

Some updates for basic case.

  1. I created cilium encapsulation, implementation was identical to flannel except that I searched for cilium device name, it's called cilium_host.
  2. Bootstrapped cluster using kubeadm with master on gcp and worker on azure.
  3. Deployed cilium with ipam Kubernetes host-scope IPAM (https://docs.cilium.io/en/v1.8/concepts/networking/ipam/kubernetes/)
  4. Deployed custom kilo build.

Kilo successfully created WireGuard network. Cilium connectivity test which runs bunch if deployments and tests connectivity worked. Pod to pod ping across gcp-azure worked.

Things I need to finish:

  1. More complex topologies. Full mesh topology is working, but when having logical groups only gateway nodes are able to communicate when using IP-IP tunnelling on Azure as already stated in some other issues.
  2. As I run cluster without kube-proxy it doesn't contain kubeconfig file which kilo mounts, for now I manually placed admin kubeconfig to /etc/kubernetes/kubeconfig on both nodes.
  3. Also I would like to have more than one gateway in each location, but this could be done in separate task.
sergeimonakhov commented 3 years ago

@SerialVelocity Hey! I have a separate control plan and a couple of geographically scattered nodes connected to it.

cilium installed using helm:

helm upgrade cilium cilium/cilium 
   --version 1.9.4 \
   --install \
   --namespace kube-system \
   --set kubeProxyReplacement=strict \
   --set k8sServiceHost=MY_PUBLIC_CONTROL_PLANE_IP \
   --set k8sServicePort=MY_PUBLIC_CONTROL_PLANE_PORT \
   --set hubble.listenAddress=":4244" \
   --set hubble.relay.enabled=true \
   --set hubble.ui.enabled=true \
   --set nodeinit.restartPods=true

kilo config:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: kilo
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kilo
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  verbs:
  - list
  - patch
  - watch
- apiGroups:
  - kilo.squat.ai
  resources:
  - peers
  verbs:
  - list
  - update
  - watch
- apiGroups:
  - apiextensions.k8s.io
  resources:
  - customresourcedefinitions
  verbs:
  - create
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: kilo
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kilo
subjects:
  - kind: ServiceAccount
    name: kilo
    namespace: kube-system
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: kilo
  namespace: kube-system
  labels:
    app.kubernetes.io/name: kilo
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: kilo
  template:
    metadata:
      labels:
        app.kubernetes.io/name: kilo
    spec:
      serviceAccountName: kilo
      hostNetwork: true
      containers:
      - name: kilo
        image: squat/kilo
        args:
        - --hostname=$(NODE_NAME)
        - --cni=false
        - --encapsulate=crosssubnet
        - --clean-up-interface=true
        - --compatibility=flannel
        - --local=false
        - --subnet=172.31.254.0/24
        resources:
          requests:
            memory: "64Mi"
            cpu: "100m"
          limits:
            memory: "256Mi"
            cpu: "1000m"
        env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        securityContext:
          privileged: true
        volumeMounts:
        - name: kilo-dir
          mountPath: /var/lib/kilo
        - name: lib-modules
          mountPath: /lib/modules
          readOnly: true
        - name: xtables-lock
          mountPath: /run/xtables.lock
          readOnly: false
      tolerations:
      - effect: NoSchedule
        operator: Exists
      - effect: NoExecute
        operator: Exists
      volumes:
      - name: kilo-dir
        hostPath:
          path: /var/lib/kilo
      - name: lib-modules
        hostPath:
          path: /lib/modules
      - name: xtables-lock
        hostPath:
          path: /run/xtables.lock
          type: FileOrCreate

iperf3 result:

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-241.91 sec  1.12 GBytes  39.6 Mbits/sec             sender
[  5]   0.00-241.00 sec  1.12 GBytes  39.8 Mbits/sec                  receiver
jawabuu commented 3 years ago

I created cilium encapsulation, implementation was identical to flannel except that I searched for cilium device name, it's called cilium_host.

@anjmao Could you share your code?

@D1abloRUS Is your config functional with cilium even having used - --compatibility=flannel?

squat commented 3 years ago

Hi @anjmao do you have any update here? I know lots of people are interested in cilium compatibility and I would be happy to look at any WIP work and collaborate to get it merged :)

RouxAntoine commented 2 years ago

If someone available and want to review this pull request https://github.com/squat/kilo/pull/312 I tried to implement what seemed to have @anjmao thanks