Open yuraxdrumz opened 2 years ago
I am trying to run an external VPP as a daemon-set on k8s on EKS and I am running into some problems.
How are you setting up the external VPP as a daemon-set on K8s? We make a few presumptions about that external VPP... among them that it has an interface that is bound to the TunnelIP (usually the Node IP).
I tried two different approaches:
Both fail with the error I sent above.
EDIT:
I didn't find any examples with external VPP, so that is why I tried the approaches above
I just tried attaching another NIC to one of my k8s nodes, created an AF_PACKET interface, set its IP to the one from AWS console and added the ip as the NSM_TUNNEL_IP and I still get the same error.
Can you explain what I am missing here? Would appreciate the help.
Which external VPP version are you running?
ghcr.io/edwarnicke/govpp/vpp:v22.06-release
Sometimes I feel so stupid.
Instead of providing /var/run/vpp/external/api.sock
I passed /var/run/vpp/external/cli.sock
.
Forwarder went up and then I tried running everything on 1 node:
Everything works. Now I will modify forwarders to use the 2nd NIC ip, as currently I can only pass NSM_TUNNEL_IP, which in my case is different for each forwarder, and then I will try to run the NSE on that node.
Running VPP as a daemon-set seems to be working.
@edwarnicke
With kernel2kernel example everything worked, but with kernel nsc to memif nse it fails because memif interface is not created.
This is the error I see in my nse vpp - unknown message: memif_socket_filename_add_del_v2_34223bdf: cannot support any of the requested mechanism
I saw #519, which removed the need for external vpp option, but I don't follow how is this supposed to work in the first place. I understand memif needs a role=server, and role=client.
Now, in my use case, my forwarder is connected to the external vpp socket and both the forwarder and the vpp pods run on hostNetwork. My kernel nsc and vpp nse run in separate ns namespaces (without hostNetwork).
Is it possible to see an example of an external vpp yaml with a memif interface?
Thanks
Sometimes I feel so stupid.
Instead of providing
/var/run/vpp/external/api.sock
I passed/var/run/vpp/external/cli.sock
. Forwarder went up and then I tried running everything on 1 node:
Not stupid. I don't believe in stupid users, I believe in doc bugs :) If you could let me know what you think might have precluded this misadventure in terms of doc improvements, I'd love that :)
ghcr.io/edwarnicke/govpp/vpp:v22.06-release
Ah... OK... that's not going to work at this moment because there is a patch missing from it for using abstract sockets for memif (if you are interested, I can explain the ins and outs of why).
Could you try using:
@edwarnicke
Thanks for the help, it worked.
I tried a couple of configurations:
memif_socket_filename_add_del_v2_34223bdf
.After seeing some issues with the cleanup of the interfaces while using the external VPP, I just added the /dev/vfio mount to the nse and used a local instance of vpp. It is easier to manage it and the healing / cleanup process is easier as well.
At the end of the day, I was wrong thinking 1 VPP is easier to manage, changed back to local vpp instances.
Thanks
Sometimes I feel so stupid. Instead of providing
/var/run/vpp/external/api.sock
I passed/var/run/vpp/external/cli.sock
. Forwarder went up and then I tried running everything on 1 node:Not stupid. I don't believe in stupid users, I believe in doc bugs :) If you could let me know what you think might have precluded this misadventure in terms of doc improvements, I'd love that :)
I think verifying we are trying to connect to the api.sock
by running some initial command, like a show int
and if it fails, catch it and add a warning in addition to the error message that returns from govpp.
An example would be:
Original message
time="2022-08-22T14:03:15Z" level=info msg="Decoding sockclnt_create_reply failed: panic occurred during decoding message sockclnt_create_reply: runtime error: slice bounds out of range [10:0]" logger=govpp/socketclient
Add warning below the message from the govpp wrapper
time="2022-08-22T14:03:15Z" level=warn msg="sending command to /var/run/vpp/external/cli.sock failed, make sure you are trying to connect to the api.sock" logger=govpp/...
@edwarnicke
Thanks for the help, it worked.
I'm a little confused... it sounds above like v22.06-rc0-147-g1c5485ab8 worked, but then below you talk about it not working...
I tried a couple of configurations:
- VPP daemon set with forwarder and nse that both use it - with the patch I see the the nse can create the memif server, but forwarder still fails with
memif_socket_filename_add_del_v2_34223bdf
.
Was this with v22.06-rc0-147-g1c5485ab8 ?
- Running forwarder with its own vpp instance and the nse with the external VPP, which works fine.
After seeing some issues with the cleanup of the interfaces while using the external VPP,
Could you file a bug on those issues... we should be cleaning up those interfaces on restart of a forwarder using an external VPP... I'm not entirely sure we try to do that with an NSE using an external VPP.
I just added the /dev/vfio mount to the nse and used a local instance of vpp. It is easier to manage it and the healing / cleanup process is easier as well.
Ah... so external VPP is intrinsically more complex (coordination needed)... its super useful for cases where you want to increase performance.
I'd love to hear more about the details of what you are doing with vfio :)
I'd also be quite interested in what you are ultimately trying to achieve.
At the end of the day, I was wrong thinking 1 VPP is easier to manage, changed back to local vpp instances.
Thanks
@edwarnicke
I meant the VPP abstract socket feature works, but connecting an NSC to an NSE with both the NSE and the forwarder using the same VPP fails with the same error memif_socket_filename_add_del_v2_34223bdf
. The server side (nse) memif is created after running the VPP daemon-set with the image you provided, but the client side memif (forwarder) returns the error above although it uses the same VPP.
I will recreate the errors over the weekend and open up an issue with all the necessary information and logs regarding the single VPP use case.
Regarding the why I am doing this,
We are trying to use NSM at my company for various use cases, one of them is the ability to allow client VPNs to connect to the internet and other branches of their offices securely via chaining security functions (nsm composition).
Before jumping to all the chaining and various other use cases, I wanted to cover internet access and have a reliable benchmark between 1 NSC to 1 NSE, whether its on same machine or different machines to know the potential of the traffic I can push, before adding any other components in-between.
In the process, I created an NSE, which acts as the internet gateway. To gain internet access from the NSE, I set it as hostNetwork, created a VETH pair and added an AF_PACKET interface and configured it to go to the internet via the default gateway.
These are my findings so far regarding simple use case of 1 NSC -> 1 NSE that goes to internet:
EKS 1.22 on AWS c5n.2xlarge nodes
speedtest-cli
and got 1-1.4Gbps download and 0.6-1Gbps upload.speedtest-cli
again, which got me 1.6-2.4Gbps download and 1.2-1.8Gbps upload.Hope this answers your questions.
Thanks
Hello,
I am trying to run an external VPP as a daemon-set on k8s on EKS and I am running into some problems.
When using local VPP, everything works as expected, the forwarder inits an AF_PACKET socket on a host-interface and creates all the necessary routes/interfaces.
When using external VPP as a daemon-set, I am getting strange errors from the forwarder. After checking the source code, I saw there are debug flags for govpp, which I enabled and noticed there is a strange out of bounds exception occurring.
Steps to reproduce:
ghcr.io/edwarnicke/govpp/vpp:v22.06-release
name: NSM_VPP_API_SOCKET, value: /var/run/vpp/external/cli.sock
to forwarder vpp k8sFiles Used: Archive.zip
Another question for that matter. I saw in the source code, you only use NONE or AF_PACKET in vppinit. Is it possible to create an X node k8s cluster on EKS, with DPDK installed on each node and a 2nd NIC bounded to the DPDK via VFIO, so that I can run the entire control plane and data plane of NSM on the separate NICS and gain the performance of single VPP per node and DPDK?