Open krishnajs opened 3 years ago
Hi @krishnajs , we're not planning to add SFC features to the Calico VPP dataplane, however we are planning to integrate it with NSM so you can run a single instance of VPP on your nodes, and use the NSM control plane for the pod SFC configuration. Would that work for your use-case?
Thanks for the feedback. This will definitely give us a path for our use cases. One thing that we need to think about is what would happen if the cluster already has other Service Mesh on the node like istio.
@krishnajs One of the things in the back of my head on the NSM side has consistently been making sure its possible to do what @AloysAugustin is proposing: sharing a single VPP instance between NSM and Calico-VPP.
As to 'what would happen if the cluster already has other Service Mesh on node like istio' ... Network Service Mesh is complementary, not competitive with L7 Service Meshes list Istio. So you should be fine there :)
Thanks, @edwarnicke, and @AloysAugustin for your perspective. Is there an estimated time to start this work? This can help us if we need to contribute.
@krishnajs Your interest and willingness to help out is appreciated! The good news is: NSM and Calico-VPP are pretty orthogonal to each other, so barring unforeseen difficulties, it shouldn't be too hard to get them to share a single VPP instance.
The reason for this is actually pretty instructive. The NSM Forwarder is basically plumbing a set of 'vWires'. When a workload requests to be connected to a Network Service, the Forwarder basically needs to:
The net result is there is very little surface area for conflict with what Calico-VPP is doing as a CNI.
I mentioned earlier 'mechanisms'. These are the 'local mechanisms' (kernel interface, memif,vfio) or remote mechanisms (vxlan, wireguard, etc) for the interface.
In the case of VXLAN, there is some possibility of collision around VNI selection. That is likely quite resolvable however as Calico has a single VNI configured for it, and NSM can be, via slight modifications, be may to avoid it.
@krishnajs Would you be willing to try a first simple pass at getting the cmd-forwarder-vpp running with Calico-VPP... it should be pretty simple to attempt. If you are interested, I'm happy to lay out the NSM side steps, and I suspect @AloysAugustin would be willing to layout the Calico-VPP side steps :)
@AloysAugustin Keep me honest about the Calico-VPP parts of this :)
@krishnajs Calico-VPP runs its own VPP instance and mounts in the directory containing the VPP programming socket. The VPP programming socket is then referenced as "/var/run/vpp/vpp-api.sock".
I made a very simple modification to the VPP Forwarder to allow it to optionally use an existing VPP instance rather than starting one of its own by setting the env variable NSM_VPP_API_SOCKET="/var/run/vpp/vpp-api.sock"
.
I made a second very simple modification to the VPP Forwarder to allow a NONE option for initializing VPP by setting the env variable NSM_VPP_INIT=NONE
.
NSM keeps a depo of examples to try: deployment-k8s.
I've got one more thing to fix and then you should be able to:
NSM_VPP_API_SOCKET="/var/run/vpp/vpp-api.sock"
b. NSM_VPP_INIT="NONE"
And run NSM against the Calico-VPP VPP instance.
@AloysAugustin how could one discover the NodeIP being used by Calico-VPP (I need it for NSM_TUNNEL_IP). Is it available as via the downward API as status.podIP if running in hostNetwork: true ?
@edwarnicke thanks a lot for this write-up. I am trying to organize our team to try this out. I Will let you know how it goes.
@krishnajs My suggestion would be to:
We still need from @AloysAugustin some information on how to figure out the IP Calico-VPP is using so we can correctly specify the NSM_TUNNEL_IP, and I still have one more small thing to fix in NSM to give us a good shot of having this simply work out of the gate :)
OK.. fixed the one last little thing :)
Now we just need @AloysAugustin to tell use what IP Calico-VPP is using so we can use that as the NSM_TUNNEL_IP :)
The most reliable way to retrieve this IP should be to get it from the Node object in k8s. It should also be available as the Pod IP in a pod running with host networking.
I also checked the VPP patches that are used in the NSM VPP forwarder, it looks like all of them are already included in the VPP version we ship with the VPP dataplane 🙂 .
I also checked the VPP patches that are used in the NSM VPP forwarder, it looks like all of them are already included in the VPP version we ship with the VPP dataplane 🙂 .
@AloysAugustin this is fantastic news!
The most reliable way to retrieve this IP should be to get it from the Node object in k8s. It should also be available as the Pod IP in a pod running with host networking.
@AloysAugustin This is also good news, as by default the forwarder uses hostNetwork:true and sets NSM_TUNNEL_IP from the status.podIP. This means we don't have to make any change for NSM_TUNNEL_IP :)
@krishnajs It should be true then that the instructions from the previous comment has a pretty good chance of just working :)
thanks @edwarnicke and @AloysAugustin we will start bring up this on our side and report back our finding
thanks @edwarnicke and @AloysAugustin we will start bring up this on our side and report back our finding
Hello @krishnajs. Do you have any update to report on this ?
Hi everyone, just to join this conversation. I am working at the moment on resolving an issue that transpires when using BGP with Calico and MetalLB to advertise routes between multiple clusters and in between clouds. Namely, I use NSM to tell the software-defined controller (OpenDaylight) that some Kubernetes cluster somewhere on bare-metal, wants to communicate to another cluster running in OpenStack cloud. One way to prevent Calico from interfering with MetalLB, or rather the ToR BGP gateway kicking out one of them from route advertising, is to use VRF. I'm experimenting with VFR virtual routing and forwarding as well as Segment-Routing (SR) to see which one of those can do the job better, although I'm leaning more towards SR as it's perfect for traffic shaping and service function chaining. The SDN controller then can label the communication (like BGP-MPLS) and send it further. Now, the forwarding plane of the NSM and the underlay network use VPP with SR-IOV. However, making this work with Calico is a bit of a headbang for the moment. Looking forward to see how it progresses further as I see a great potential in this domain for the future.
Hi @brunodzogovic , if I'm understanding you correctly, at least one of your issues is that you have two BGP daemons running on each node, the MetalLB one and the Calico one. Have you tried announcing the service addresses in BGP directly with Calico? This is described on this page in the calico docs, and it should allow MetalLB and Calico to coexist nicely.
Is calico planning to support Service function chaining for bare-metal Kubernetes deployment?
VPP which is also the dataplane for Network Service Mesh NSM supports Policy-driven service function chaining.
SFC is a key technology for certain Telco workloads and there are not many CNIs in K8s ecosystem that supports SFC.