openyurtio / raven

provide layer 3 and layer 7 network connectivity among pods in different physical regions
Apache License 2.0
57 stars 37 forks source link

Testing this using kind and physical node #132

Open siredmar opened 1 year ago

siredmar commented 1 year ago

Hi!

i'm quite new to raven. I have quite a complex task for our local dev environment (so no production case at all). So this is the scenario:

I'm running a local kubernetes cluster with kind. kind spins up the kubernetes cluster as docker containers.

$ docker ps
CONTAINER ID   IMAGE                                         COMMAND                  CREATED          STATUS          PORTS                                                                                      NAMES
809584a90ff7   ghcr.io/edgefarm/edgefarm/kind-node:v1.22.7   "/usr/local/bin/entr…"   19 minutes ago   Up 19 minutes                                                                                              edgefarm-worker2
78f270d61ef3   ghcr.io/edgefarm/edgefarm/kind-node:v1.22.7   "/usr/local/bin/entr…"   19 minutes ago   Up 19 minutes   0.0.0.0:6443->6443/tcp, 127.0.0.1:41069->6443/tcp                                          edgefarm-control-plane
20fa09d51172   ghcr.io/edgefarm/edgefarm/kind-node:v1.22.7   "/usr/local/bin/entr…"   19 minutes ago   Up 19 minutes   0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp, 0.0.0.0:4222->4222/tcp, 0.0.0.0:7422->7422/tcp   edgefarm-worker
6ae9485afc49   ghcr.io/edgefarm/edgefarm/kind-node:v1.22.7   "/usr/local/bin/entr…"   19 minutes ago   Up 19 minutes                                                                                              edgefarm-worker3

So these all run on my local machine (192.168.1.46) I joined a raspberry pi node to this local cluster called eagle (192.168.1.100) doing some kind trickery (installed flannel instead of the kind builtin cni). This works great. Workload is deployed on this node, but when it comes to accessing services that run on other nodes this won't work.

$ k get nodes -o wide      
NAME                     STATUS   ROLES                  AGE   VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
eagle                    Ready    <none>                 15m   v1.22.7   192.168.1.100   <none>        Ubuntu 22.04.1 LTS   5.15.0-1034-raspi   docker://20.10.18
edgefarm-control-plane   Ready    control-plane,master   22m   v1.22.7   172.18.0.4      <none>        Ubuntu 21.10         5.15.0-78-generic   containerd://1.5.10
edgefarm-worker          Ready    <none>                 21m   v1.22.7   172.18.0.2      <none>        Ubuntu 21.10         5.15.0-78-generic   containerd://1.5.10
edgefarm-worker2         Ready    <none>                 21m   v1.22.7   172.18.0.5      <none>        Ubuntu 21.10         5.15.0-78-generic   containerd://1.5.10
edgefarm-worker3         Ready    <none>                 21m   v1.22.7   172.18.0.3      <none>        Ubuntu 21.10         5.15.0-78-generic   containerd://1.5.10

What i'm trying to achieve is that i have a transparent cluster and pod network within this cluster, containing the eagle node. Example: I have a deployment that runs pods on eagle and edgefarm-worker. The pods itself shall be able to ping the other pods ip address.

I tried several raven configurations. My latest one looks like this:

I have two gateways defined: gw-cloud and gw-edge

$ k get gw                                             
NAME       ACTIVEENDPOINT
gw-cloud   edgefarm-worker
gw-eagle   eagle

I'm installing raven with the helm chart. This is my values.yaml

nodeSelector: null
tolerations:
  - operator: Exists
  - effect: NoSchedule
    key: edgefarm.io
    operator: Exists
vpn:
  psk: 98b2d59f5b201d9649736da96b44b37df91eb0c7bbe645ed474f48627bec16647e55dfa0cc803b356a8e3a4857d6f59108299e271694e2c8ca8fd38f1f9ebdd5
  metricBindAddr: ":8081"
  driver: libreswan

I got raven deployed on every node:

$ k get pods -n kube-system -o wide | grep raven
raven-agent-ds-4vhfl                             1/1     Running   0          6s      192.168.1.100   eagle                    <none>           <none>
raven-agent-ds-88sx8                             1/1     Running   0          6s      172.18.0.4      edgefarm-control-plane   <none>           <none>
raven-agent-ds-9hz7q                             1/1     Running   0          6s      172.18.0.3      edgefarm-worker3         <none>           <none>
raven-agent-ds-nx2t2                             1/1     Running   0          6s      172.18.0.5      edgefarm-worker2         <none>           <none>
raven-agent-ds-t4ddq                             1/1     Running   0          6s      172.18.0.2      edgefarm-worker          <none>           <none>

These are the logs of raven on eagle

$ k logs raven-agent-ds-4vhfl
+ cat /proc/sys/net/ipv4/conf/all/send_redirects
+ '[' 0 '=' 0 ]
+ exec agent '--node-name=eagle' '--vpn-driver=libreswan' '--forward-node-ip=false' '--metric-bind-addr=:8081' '--feature-gates='
W0807 07:06:03.220486       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0807 07:06:03.356401       1 start.go:70] route driver vxlan initialized
I0807 07:06:03.357402       1 libreswan.go:293] starting pluto
Initializing NSS database

I0807 07:06:04.360613       1 libreswan.go:315] start pluto successfully
I0807 07:06:04.361189       1 start.go:79] VPN driver libreswan initialized
I0807 07:06:04.361692       1 engine_controller.go:103] engine controller successfully start
I0807 07:06:04.466414       1 engine_controller.go:180] "applying network" localEndpoint="192.168.1.100" remoteEndpoint=map[gw-cloud:172.18.0.2]
I0807 07:06:04.466539       1 libreswan.go:102] no desired connections, cleaning vpn connections
I0807 07:06:04.467553       1 vxlan.go:80] only gateway node exist in current gateway, cleaning up route setting
I0807 07:06:04.601354       1 engine_controller.go:176] network not changed, skip to process

And the logs of raven on edgefarm-worker

$ k logs raven-agent-ds-t4ddq                    
+ cat /proc/sys/net/ipv4/conf/all/send_redirects
+ '[' 0 '=' 0 ]
+ exec agent '--node-name=edgefarm-worker' '--vpn-driver=libreswan' '--forward-node-ip=false' '--metric-bind-addr=:8081' '--feature-gates='
W0807 07:06:01.232444       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0807 07:06:01.301817       1 start.go:70] route driver vxlan initialized
I0807 07:06:01.302028       1 libreswan.go:293] starting pluto
Initializing NSS database

I0807 07:06:02.302344       1 libreswan.go:315] start pluto successfully
I0807 07:06:02.302364       1 start.go:79] VPN driver libreswan initialized
I0807 07:06:02.302437       1 engine_controller.go:103] engine controller successfully start
I0807 07:06:02.404301       1 engine_controller.go:180] "applying network" localEndpoint="172.18.0.2" remoteEndpoint=map[gw-eagle:192.168.1.100]
I0807 07:06:02.404358       1 libreswan.go:102] no desired connections, cleaning vpn connections
I0807 07:06:02.404680       1 vxlan.go:80] only gateway node exist in current gateway, cleaning up route setting
I0807 07:06:02.545809       1 engine_controller.go:176] network not changed, skip to process

I don't have experience regarding raven. But to me the logs don't say much other than: there are no VPN connections.

Can anyone assist me setting up raven in this local setup?

rambohe-ch commented 1 year ago

@njucjc PTAL

rambohe-ch commented 1 year ago

@njucjc PTAL

njucjc commented 1 year ago

@siredmar What network plugin is your kind cluster using? It seem yurt-mananger can't get PodCIDR in your node Spec

siredmar commented 1 year ago

@njucjc My kind cluster is using flannel as CNI. My strong guess is that the raven-agent on eagle (192.168.1.100) cannot possibly reach the raven-agents on the kind nodes that live in a private docker network located on my laptop (172.18.0.0/24). So there won't by any way of direct connection. I've managed to make this work using tailscale and netbird. These are VPN solutions use a connection coordinator. The coordinator tries to make a connection with various technologies like hole punching. After the VPN connection was established on every node using tailscale or netbird, i had to reconfigure flannel to use that newly created VPN interface.

So what i guess what i would need is some sort of connection coordinator to make this kind of connection work. Does raven support something like that?

BSWANG commented 1 year ago

@siredmar are the kind nodes under bridge can access eagle(192.168.1.100) by nat ? Currently, raven need single direction access between edges at least. If both edge undernat, they need NAT-T capability, this feature will be introduced by https://github.com/openyurtio/openyurt/pull/1639/files