Open edwarnicke opened 3 years ago
This fix is needed for Calico/VPP to work in Kind: https://github.com/projectcalico/vpp-dataplane/pull/204
Even though that PR has not been merged, the docker images have been pushed and these steps should work: https://github.com/projectcalico/vpp-dataplane/pull/204/files#diff-9004d08acd588e7b7e93a8ff6fbe357d4eba3adc003d48ab4b7bed0186af1a11R1
Hi @edwarnicke , just an note that testing the integration in GKE will be complex at this stage, because we haven't found a way to override the default CNI in GKE, so Calico/VPP doesn't work there for now. Also, why are you making a difference in the last two steps between Calico/VPP owning or not the main interface? Calico/VPP has relatively strong assumptions that it owns the main interface (= the interface that has the k8s Node address). Giving Calico/VPP an other interface will likely result in a non-functional cluster. As a side note, we are starting to look at giving more than one interface to VPP in a Calico/VPP deployment, but that isn't supported yet.
Also, why are you making a difference in the last two steps between Calico/VPP owning or not the main interface? Calico/VPP has relatively strong assumptions that it owns the main interface (= the interface that has the k8s Node address). Giving Calico/VPP an other interface will likely result in a non-functional cluster. As a side note, we are starting to look at giving more than one interface to VPP in a Calico/VPP deployment, but that isn't supported yet.
@AloysAugustin You are correct, I should have phrased the last one differently. I was thinking in terms of 'attaches to the interface with vfio' vs 'attaches to the interface with AF_XDP'... the idea being to attach with the highest performance option.
Ah, sounds good then :+1:
Calico uses VPP v21.xx, so probably we need to first update used VPP version in Forwarder and after make a new try: https://github.com/networkservicemesh/cmd-forwarder-vpp/issues/284
@edwarnicke We are still facing issues with with VPP/govpp versions used in Calico and used in VPP Forwarder. Currently to make it work I need to:
github.com/edwarnicke/govpp/binapi
with github.com/projectcalico/vpp-dataplane/vpplink/binapi/vppapi
in VPP Forwarder.Probably [2] step is not actually needed and can be fixed with changing used govpp version in [1] - needs to be tested.
But it actually looks like if we want to support such integration, we need to provide Calico images, k8s configuration files. Is it OK?
There are 2 issues needs to be fixed to make it work:
memif
and memifproxy
socket files shared with Calico VPP pod - https://github.com/networkservicemesh/sdk-vpp/issues/357.Currently there is another issue - Calico and NSM uses different VPP versions with some different additional patches, so for testing I am currently using Calico VPP fork with added NSM VPP patches: https://github.com/Bolodya1997/vpp-dataplane/blob/nsm-new/vpplink/binapi/vpp_clone_current.sh. And govpp fork with added Calico VPP patches (added only to generated part, not to the generator): https://github.com/Bolodya1997/govpp/tree/calico.
Update: Memif2Memif
test case is not currently working - https://github.com/networkservicemesh/sdk-vpp/issues/362.
Failing to start k8s cluster with Calico on packet, so created an issue to the Calico team - https://github.com/projectcalico/vpp-dataplane/issues/217.
Update: succeeded to setup a cluster, currently working with tests.
Update: All basic scenarios except Memif2Memif
currently work - networkservicemesh/sdk-vpp#362.
Vladimir Popov Yesterday at 5:38 PM
---
Hi, I am trying to use vpp-calico with different cloud providers: [AKS, GKE, AWS].
On project wiki I have found page only for AWS integration. Does it mean that [AKS, GKE] currently can’t
be configured to use vpp-calico?
Aloys Augustin 2 hours ago
---
Hi Vladimir, at this point only EKS is officially supported. We're working on AKS support which may come
in the near future. GKE is less likely to be supported soon because GKE doesn't allow to swap the CNI,
however there is always the option to deploy a self-managed cluster on google cloud as well.
@edwarnicke Looks like it can be hardly possible to test NSM with Calico VPP on GKE, AKS.
Used https://docs.projectcalico.org/reference/vpp/uplink-configuration Using DPDK -> With available hugepages
. @edwarnicke is it exactly what you mean by binding interface with vfio?
All basic scenarios except Memif2Memif
currently work. Tested additionally with Vfio2Noop
scenario to make sure that there is no problem with VFIO - also works well.
Tested with the abstract sockets solution. All basic scenarios except Memif2Memif
currently work.
@edwarnicke Do we want to have any CI running for this issue?
Yes
@edwarnicke Please, take a look at the following schemes and algorithms. Are all of them OK or do we need something to be implemented in some other way?
netns file
abstract socket path
netns file
abstract socket path
abstract socket path
abstract socket path
abstract socket path
netns file
abstract socket path
netns file
NSC netns file
NSE abstract socket path
NSE abstract socket path
NSE netns file
proxy abstract socket path
in NSC netns
and starts transferring all data between NSE abstract socket path
in NSE netns
.proxy abstract socket path
proxy abstract socket path
This looks about right yes :)
@edwarnicke
Calico has integrated all needed for NSM patches to their VPP, but we still have different VPP version, so cmd-forwarder-vpp
cannot be directly used with Calico VPP.
Should we create a new cmd-forwarder-vpp-calico
with govpp
generated for Calico VPP version?
Or maybe we should use last release Calico VPP as a base for the NSM VPP applications and so just update govpp
?
@Bolodya1997 I'll spin a new image with their patches, test it, and we can look at upgrading.
Is everything else working well? Is it just a matter of updating our image and landing some PRs from you?
Is everything else working well? Is it just a matter of updating our image and landing some PRs from you?
I am working on abstract sockets memif implementation, it will be clear after I will finish and test it. Currently it is still not clear whether there is or not an issue with LinkUP events, because it possibly can be caused by old solution.
Is everything else working well? Is it just a matter of updating our image and landing some PRs from you?
@edwarnicke
We have a problem with Calico VPP setup on packet - internet is not accessible from pods without hostNetwork: true
.
It actually looks like I am missing something in configuration, filed an issue for this in Calico repo - https://github.com/projectcalico/vpp-dataplane/issues/263.
This issue affects DNS test, but we are planning to rework it, because it would make more sense if test nslookup something like kubernetes.default
instead of google.com
and so we don't need internet access in such case.
All other basic
/feature
tests are working, currently I am working on CI.
@Mixaster995 You will probably need this: https://github.com/networkservicemesh/cmd-forwarder-vpp/pull/421
@edwarnicke
We have tested Calico Integration PR and we have the following suggestions:
This PR does the integration Calico on Packet. Problems:
A cursory test found problems with the setup - Calico-Vpp doesn't start. Need more time to research. The problem can be related to Calico or to Kind.
Currently, we have 2 version of tests - usual and for Calico. We need to consider use only one version.
We can try to use External VppAPISocket
as default (https://github.com/networkservicemesh/cmd-forwarder-vpp/blob/main/internal/config/config.go#L49) and mount this socket from host to a specific folder on forwarder (vpp-ext
for example).
Forwarder will check for the default VPP API socket on startup.
So, if we have one - use it (Calico case), if not - create a new vpp instance (current behavior).
As I remember there are many chain elements that (explicitly or not) assume that forwarder death == vpp death. It's not right for the Calico case. We need to come up with a correct VPP cleaning when the forwarder is restarted:
There is a problem with forwarder configuration. It is related to network namespaces - Calico-VPP doesn't have grpcfd. For example, when we connect to the Endpoint, forwarder receives network namespace fd using grpcfd
. But Calico-VPP doesn't have that one, therefore knows nothing about NSE's network namespace. And when we try to create network interface - we receive an error.
hostPID:true
for the forwarder by default - see comment - https://github.com/networkservicemesh/sdk-vpp/issues/354#issuecomment-904665828inode
to the sidecar and create a unix
connection between forwarder and sidecar to send fd
We think that 1 is the preferred solution at the moment. We can create an issue to use a different approach in future releases.
@edwarnicke
What do you think? Is https://github.com/networkservicemesh/sdk-vpp/issues/354#issuecomment-904665828 still actual and we can use hostPID:true
by default?
Calico allows for a choice of dataplanes. VPP is one of them.
Normally cmd-forwarder-vpp, normally cmd-forwarder-vpp starts its own instance of vpp in its own Pod.
In response to a request for integration between NSM and Calico/VPP, the process for integration was described.
This issue is about actually trying (and shaking the bugs out of) such integration.
This breaks down into a number of steps: