networkop / meshnet-cni

a (K8s) CNI plugin to create arbitrary virtual network topologies
BSD 3-Clause "New" or "Revised" License
116 stars 28 forks source link

Ablity to choose a specific NIC for VXLAN bind #71

Closed sar772004 closed 1 year ago

sar772004 commented 1 year ago

Problem: Way to define specific port on the compute to carry the VXLAN traffic.

Issue: Currently, I am trying to achieve the MTU of vxlan bind atleast 9232.

By the way it seems the MTU currently is being picked from the underlying interface and its being tied to the primary address as defined here. https://github.com/networkop/meshnet-cni/blob/master/manifests/base/daemonset.yaml#L41

We want to use a specific NIC for carrying the vxlan binds. and i m not finding a way to define the endpoint IP to be picked up from a specific interface on the computes.

Any idea how i can point to a specific underlay interface for the VXLAN binds to use ? Once i have the chosen interface(which supports larger MTU) i could have a large MTU on that and probably proceed.

Also this allows us to control entire meshnet traffic on a specific interface rather than on the k8s control/mgmt port.

networkop commented 1 year ago

I see what you mean. This is currently not implemented, although it doesn't seem like it would be too difficult a change. e.g. HOST_INTF variable that would have some tie-breaking logic with HOST_IP.

sar772004 commented 1 year ago

I see what you mean. This is currently not implemented, although it doesn't seem like it would be too difficult a change. e.g. HOST_INTF variable that would have some tie-breaking logic with HOST_IP.

Thanks that will be really useful change

sar772004 commented 1 year ago

Can you please check this ?

In my network breth2(5.2.x.x n/w and nodeIP subnet) is the k8s control plane and breth0 is the default gateway I was debugging a case were one of the vxlan interfaces in the pod was having MTU 1450 , (It is random each bringup)

I have the meshnetd logs from the node, showing the cases for failed eth3 and working eth1 case in terms of MTU assignment.

It could be either the failure to find the eth3 link in the failed case, but i m not sure why that happens

Failed eth3 : MTU is being set to 1450 instead of underlay MTU

time="2023-03-30T02:10:02Z" level=warning msg="[transport] transport: http2Server.HandleStreams failed to read frame: read tcp 5.2.1.1:51111->5.2.1.3:47074: read: connection reset by peer" system=system time="2023-03-30T02:10:02Z" level=info msg="[transport] transport: loopyWriter.run returning. connection error: desc = \"transport is closing\"" system=system time="2023-03-30T02:10:04Z" level=info msg="Created koko Veth struct {NsName:/proc/248505/ns/net LinkName:eth3 IPAddr:[] MirrorEgress: MirrorIngress:}" daemon=meshnetd overlay=vxLAN time="2023-03-30T02:10:04Z" level=info msg="Created koko vxlan struct {ParentIF:breth0 ID:5039 IPAddr:5.2.1.3 MTU:0 UDPPort:0}" daemon=meshnetd overlay=vxLAN time="2023-03-30T02:10:04Z" level=warning msg="failed to get link: eth3" daemon=meshnetd overlay=vxLAN time="2023-03-30T02:10:04Z" level=info msg="Retrieved eth3 link from /proc/248505/ns/net Netns: " daemon=meshnetd overlay=vxLAN time="2023-03-30T02:10:04Z" level=info msg="Is link a VXLAN?: false" daemon=meshnetd overlay=vxLAN time="2023-03-30T02:10:04Z" level=info msg="Link we've found isn't a vxlan or doesn't exist" daemon=meshnetd overlay=vxLAN time="2023-03-30T02:10:04Z" level=info msg="Creating a VXLAN link: {breth0 5039 5.2.1.3 0 0}; inside the pod: {/proc/248505/ns/net eth3 [] }" daemon=meshnetd overlay=vxLAN time="2023-03-30T02:10:04Z" level=info msg="koko: create vxlan link koko2596996162 under breth0" time="2023-03-30T02:10:04Z" level=info msg="finished unary call with code OK" grpc.code=OK grpc.method=Update grpc.service=meshnet.v1beta1.Remote grpc.start_time="2023-03-30T02:10:04Z" grpc.time_ms=25.858 peer.address="5.2.1.3:47090" span.kind=server system=grpc time="2023-03-30T02:10:04Z" level=warning msg="[transport] transport: http2Server.HandleStreams failed to read frame: read tcp 5.2.1.1:51111->5.2.1.3:47090: read: connection reset by peer" system=system time="2023-03-30T02:10:04Z" level=info msg="[transport] transport: loopyWriter.run returning. connection error: desc = \"transport is closing\"" system=system

Passing eth1: picks up 8950 as underlay breth2 has MTU 9000

time="2023-03-30T02:10:41Z" level=info msg="Created koko Veth struct {NsName:/proc/248505/ns/net LinkName:eth1 IPAddr:[] MirrorEgress: MirrorIngress:}" daemon=meshnetd overlay=vxLAN time="2023-03-30T02:10:41Z" level=info msg="Created koko vxlan struct {ParentIF:breth0 ID:5037 IPAddr:5.2.1.2 MTU:0 UDPPort:0}" daemon=meshnetd overlay=vxLAN time="2023-03-30T02:10:41Z" level=info msg="Retrieved eth1 link from /proc/248505/ns/net Netns: &{LinkAttrs:{Index:44 MTU:8950 TxQLen:1000 Name:eth1 HardwareAddr:ae:9f:af:b0:ea:28 Flags:up|broadcast|multicast RawFlags:69699 ParentIndex:44 MasterIndex:0 Namespace: Alias: Statistics:0xc000334240 Promisc:0 Xdp:0xc0005a88b8 EncapType:ether Protinfo: OperState:unknown NetNsID:0 NumTxQueues:1 NumRxQueues:1 GSOMaxSize:65536 GSOMaxSegs:65535 Vfs:[] Group:0 Slave:} VxlanId:5037 VtepDevIndex:8 SrcAddr: Group:5.2.1.2 TTL:0 TOS:0 Learning:true Proxy:false RSC:false L2miss:true L3miss:true UDPCSum:true UDP6ZeroCSumTx:false UDP6ZeroCSumRx:false NoAge:false GBP:false FlowBased:false Age:300 Limit:0 Port:4789 PortLow:0 PortHigh:0}" daemon=meshnetd overlay=vxLAN time="2023-03-30T02:10:41Z" level=info msg="Is link &{LinkAttrs:{Index:44 MTU:8950 TxQLen:1000 Name:eth1 HardwareAddr:ae:9f:af:b0:ea:28 Flags:up|broadcast|multicast RawFlags:69699 ParentIndex:44 MasterIndex:0 Namespace: Alias: Statistics:0xc000334240 Promisc:0 Xdp:0xc0005a88b8 EncapType:ether Protinfo: OperState:unknown NetNsID:0 NumTxQueues:1 NumRxQueues:1 GSOMaxSize:65536 GSOMaxSegs:65535 Vfs:[] Group:0 Slave:} VxlanId:5037 VtepDevIndex:8 SrcAddr: Group:5.2.1.2 TTL:0 TOS:0 Learning:true Proxy:false RSC:false L2miss:true L3miss:true UDPCSum:true UDP6ZeroCSumTx:false UDP6ZeroCSumRx:false NoAge:false GBP:false FlowBased:false Age:300 Limit:0 Port:4789 PortLow:0 PortHigh:0} a VXLAN?: true" daemon=meshnetd overlay=vxLAN time="2023-03-30T02:10:41Z" level=info msg="finished unary call with code OK" grpc.code=OK grpc.method=Update grpc.service=meshnet.v1beta1.Remote grpc.start_time="2023-03-30T02:10:41Z" grpc.time_ms=0.523 peer.address="5.2.1.2:39396" span.kind=server system=grpc

meshnet-cni logs: /var/log/meshnet-cni.log ( peer pod was not alive at veth create time, possibly cause for the problem ?)

time="2023-03-29T19:09:58-07:00" level=info msg="Creating Veth struct with NetNS:/proc/248505/ns/net and intfName: eth3, IP:" time="2023-03-29T19:09:58-07:00" level=info msg="Pod test is retrieving peer pod dut-e information from meshnet daemon" time="2023-03-29T19:09:58-07:00" level=info msg="Is peer pod dut-e alive?: false" time="2023-03-29T19:09:58-07:00" level=info msg="Peer pod dut-e isn't alive yet, continuing"

kingshukdev commented 1 year ago

@sar772004 - trying to understand first what you tried :

in daemon set yaml, in place of valueFrom you have give the "value" of HOST_IP directly ?

Note :- unless your two pods are hosted in two different K8S node, there will not be any VxLAN between them. If K8S deploy the pods in the same node, then they have vEth pair between them. In these two cases the interface MTUs are different. This causes unpredictability in MTU as it depends on how pods are deployed.

If your core requirement is :-

sar772004 commented 1 year ago

@sar772004 - trying to understand first what you tried :

  • name: HOST_IP valueFrom: fieldRef: fieldPath: status.hostIP

in daemon set yaml, in place of valueFrom you have give the "value" of HOST_IP directly ?

Note :- unless your two pods are hosted in two different K8S node, there will not be any VxLAN between them. If K8S deploy the pods in the same node, then they have vEth pair between them. In these two cases the interface MTUs are different. This causes unpredictability in MTU as it depends on how pods are deployed.

If your core requirement is :-

  • Have predictable MTU and configurable.
  • Have MTU > 1500 I am hoping I will have some cycle in 2nd half of April to explore these two.

My setup is with 4 nodes connected via breth2 ( 5.2.1.0/24 subnet)

I had tried hardcoding the host_ip in yaml, but that leads to this error when the pod is schedule on another node

  Warning  FailedCreatePodSandBox  29s (x11 over 8m21s)  kubelet            (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "f1cf1f4f41b83c7e84d690c33b78348b813943a62eca3de48180586def9a90fd" network for pod "dut-a": networkPlugin cni failed to set up pod "dut-a_default" network: no iface found for address 5.2.1.1

You are right, those are my core requirements. But its not just MTU, selecting of interface allows more control on where the data path traffic goes.

To achieve status.NodeIP in daemonset.yaml fetch the breth2 IP, i switched my k8s init to use the 5.2 network like this "kubeadm init --apiserver-advertise-address ${breth2IP} .. rest of args"

(HYP-1-HOST):~ > kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME hyp-1.xyx.com Ready control-plane 21m v1.25.0 5.2.1.1 Rocky Linux 8.7 (Green Obsidian) 4.18.0-425.13.1.el8_7.x86_64 docker://20.10.22 hyp-2.xyz.com Ready 21m v1.25.0 5.2.1.2 Rocky Linux 8.7 (Green Obsidian) 4.18.0-425.13.1.el8_7.x86_64 docker://20.10.22 hyp-3.xyz.com Ready 20m v1.25.0 5.2.1.3 Rocky Linux 8.7 (Green Obsidian) 4.18.0-425.13.1.el8_7.x86_64 docker://20.10.22 hyp-4.xyz.com Ready 20m v1.25.0 5.2.1.4 Rocky Linux 8.7 (Green Obsidian) 4.18.0-425.13.1.el8_7.x86_64 docker://20.10.22

sar772004 commented 1 year ago

I m novice with go, so excuse any mistakes, but based on @networkop comment, we could add HOST_INTF env to make this more deterministic. i have some changes, and tested this locally by Just building the meshnet docker image and changing reference in base/daemonset.yaml . If this acceptable, i can create a pull request

diff --git a/daemon/meshnet/handler.go b/daemon/meshnet/handler.go
index ccefd57..4bf31ad 100644
--- a/daemon/meshnet/handler.go
+++ b/daemon/meshnet/handler.go
@@ -64,6 +64,7 @@ func (m *Meshnet) Get(ctx context.Context, pod *mpb.PodQuery) (*mpb.Pod, error)
        srcIP, _, _ := unstructured.NestedString(result.Object, "status", "src_ip")
        netNs, _, _ := unstructured.NestedString(result.Object, "status", "net_ns")
        nodeIP := os.Getenv("HOST_IP")
+       nodeIntf := os.Getenv("HOST_INTF")

        return &mpb.Pod{
                Name:   pod.Name,
@@ -72,6 +73,7 @@ func (m *Meshnet) Get(ctx context.Context, pod *mpb.PodQuery) (*mpb.Pod, error)
                KubeNs: pod.KubeNs,
                Links:  links,
                NodeIp: nodeIP,
+               NodeIntf: nodeIntf,
        }, nil
 }

diff --git a/daemon/proto/meshnet/v1beta1/meshnet.pb.go b/daemon/proto/meshnet/v1beta1/meshnet.pb.go
index 49dd15d..5db0039 100644
--- a/daemon/proto/meshnet/v1beta1/meshnet.pb.go
+++ b/daemon/proto/meshnet/v1beta1/meshnet.pb.go
@@ -31,6 +31,7 @@ type Pod struct {
        KubeNs string  `protobuf:"bytes,4,opt,name=kube_ns,json=kubeNs,proto3" json:"kube_ns,omitempty"`
        Links  []*Link `protobuf:"bytes,5,rep,name=links,proto3" json:"links,omitempty"`
        NodeIp string  `protobuf:"bytes,6,opt,name=node_ip,json=nodeIp,proto3" json:"node_ip,omitempty"`
+       NodeIntf string  `protobuf:"bytes,6,opt,name=node_intf,json=nodeIntf,proto3" json:"node_intf,omitempty"`
 }

 func (x *Pod) Reset() {
diff --git a/daemon/proto/meshnet/v1beta1/meshnet.proto b/daemon/proto/meshnet/v1beta1/meshnet.proto
index dbe46ab..a24a1da 100644
--- a/daemon/proto/meshnet/v1beta1/meshnet.proto
+++ b/daemon/proto/meshnet/v1beta1/meshnet.proto
@@ -12,6 +12,7 @@ message Pod {
   string kube_ns = 4;
   repeated Link links = 5;
   string node_ip = 6;
+  string node_intf = 7;
 }

 message Link {
@@ -136,4 +137,4 @@ service Remote {
 service WireProtocol {
   rpc SendToOnce (Packet) returns (BoolResponse);
   rpc SendToStream (stream Packet) returns (BoolResponse);
-}
\ No newline at end of file
+}
diff --git a/plugin/meshnet.go b/plugin/meshnet.go
index 4eb518a..a48959d 100644
--- a/plugin/meshnet.go
+++ b/plugin/meshnet.go
@@ -84,13 +84,13 @@ func loadConf(bytes []byte) (*netConf, *current.Result, error) {
 }

 // getVxlanSource uses netlink to get the iface reliably given an IP address.
-func getVxlanSource(nodeIP string) (string, string, error) {
-       if nodeIP == "" {
-               return "", "", fmt.Errorf("meshnetd provided no HOST_IP address: %s", nodeIP)
+func getVxlanSource(nodeIP string, nodeIntf string) (string, string, error) {
+       if nodeIntf == "" && nodeIP == "" {
+               return "", "", fmt.Errorf("meshnetd provided no HOST_IP address: %s or HOST_INTF: %s", nodeIP, nodeIntf)
        }
        nIP := net.ParseIP(nodeIP)
-       if nIP == nil {
-               return "", "", fmt.Errorf("parsing failed for meshnetd provided no HOST_IP address: %s", nodeIP)
+       if nIP == nil && nodeIntf == "" {
+               return "", "", fmt.Errorf("parsing failed for meshnetd provided no HOST_IP address: %s and node HOST_INTF: %s", nodeIP, nodeIntf)
        }
        ifaces, _ := net.Interfaces()
        for _, i := range ifaces {
@@ -103,7 +103,12 @@ func getVxlanSource(nodeIP string) (string, string, error) {
                        case *net.IPAddr:
                                ip = v.IP
                        }
-                       if nIP.Equal(ip) {
+                       if nodeIntf != "" {
+                          if i.Name == nodeIntf {
+                              return ip.String(), nodeIntf, nil
+                  }
+                       }
+                       if nIP != nil && nIP.Equal(ip) {
                                log.Infof("Found iface %s for address %s", i.Name, nodeIP)
                                return nodeIP, i.Name, nil
                        }
@@ -182,7 +187,7 @@ func cmdAdd(args *skel.CmdArgs) error {
        }

        // Finding the source IP and interface for VXLAN VTEP
-       srcIP, srcIntf, err := getVxlanSource(localPod.NodeIp)
+       srcIP, srcIntf, err := getVxlanSource(localPod.NodeIp, localPod.NodeIntf)
        if err != nil {
                return err
        }