External VPP Issue - Githubissues

yuraxdrumz commented 2 years ago

Hello,

I am trying to run an external VPP as a daemon-set on k8s on EKS and I am running into some problems.

When using local VPP, everything works as expected, the forwarder inits an AF_PACKET socket on a host-interface and creates all the necessary routes/interfaces.

When using external VPP as a daemon-set, I am getting strange errors from the forwarder. After checking the source code, I saw there are debug flags for govpp, which I enabled and noticed there is a strange out of bounds exception occurring.

Aug 22 14:03:12.835 [DEBU] /var/run/vpp/external/cli.sock was created after 8.681051ms
time="2022-08-22T14:03:12Z" level=debug msg=SetMsgCallback logger=govpp/socketclient
time="2022-08-22T14:03:12Z" level=debug msg="Connecting to VPP.."
time="2022-08-22T14:03:12Z" level=debug msg="Connecting to: /var/run/vpp/external/cli.sock" logger=govpp/socketclient
time="2022-08-22T14:03:12Z" level=debug msg="Connected to socket (local addr: @)" logger=govpp/socketclient
time="2022-08-22T14:03:12Z" level=debug msg=" - header sent (16/16): 00 00 00 00 00 00 00 00 00 00 00 46 00 00 00 00" logger=govpp/socketclient
time="2022-08-22T14:03:12Z" level=debug msg=" - x=70 i=0 len=70 mod=0" logger=govpp/socketclient
time="2022-08-22T14:03:12Z" level=debug msg=" - data sent x=70 (70/70): 00 0F 00 00 00 7B 67 6F 76 70 70 73 6F 63 6B 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00" logger=govpp/socketclient
time="2022-08-22T14:03:12Z" level=debug msg=" -- writeMsg done" logger=govpp/socketclient
time="2022-08-22T14:03:12Z" level=debug msg="reading msg.." logger=govpp/socketclient
time="2022-08-22T14:03:12Z" level=debug msg=" - read data (29 bytes): FE 01 FF FD 18 FF FA 18 01 FF F0 FF FD 1F FF FA 1F 01 FF F0 67 6F 76 70 70 73 6F 63 6B" logger=govpp/socketclient
time="2022-08-22T14:03:12Z" level=debug msg="continue reading remaining 67108070 bytes" logger=govpp/socketclient
time="2022-08-22T14:03:13Z" level=debug msg="another data received: 172 bytes (remain: 67107898)" logger=govpp/socketclient
time="2022-08-22T14:03:15Z" level=debug msg=" -- readMsg done (buffered: 0)" logger=govpp/socketclient
time="2022-08-22T14:03:15Z" level=info msg="Decoding sockclnt_create_reply failed: panic occurred during decoding message sockclnt_create_reply: runtime error: slice bounds out of range [10:0]" logger=govpp/socketclient
time="2022-08-22T14:03:15Z" level=debug msg="Closing socket" logger=govpp/socketclient

Steps to reproduce:

Create EKS cluster v1.22
Create SPIRE crd mode v1.3.3
Create VPP daemon-set with hostPID and hostNetwork ghcr.io/edwarnicke/govpp/vpp:v22.06-release
Create NSM control plane v1.5.0 + forwarder VPP v1.5.0
Pass name: NSM_VPP_API_SOCKET, value: /var/run/vpp/external/cli.sock to forwarder vpp k8s

Files Used: Archive.zip

Another question for that matter. I saw in the source code, you only use NONE or AF_PACKET in vppinit. Is it possible to create an X node k8s cluster on EKS, with DPDK installed on each node and a 2nd NIC bounded to the DPDK via VFIO, so that I can run the entire control plane and data plane of NSM on the separate NICS and gain the performance of single VPP per node and DPDK?

edwarnicke commented 2 years ago

I am trying to run an external VPP as a daemon-set on k8s on EKS and I am running into some problems.

How are you setting up the external VPP as a daemon-set on K8s? We make a few presumptions about that external VPP... among them that it has an interface that is bound to the TunnelIP (usually the Node IP).

yuraxdrumz commented 2 years ago

I tried two different approaches:

Create AF_PACKET with a veth pair before running the forwarder
Bind another NIC to the node and run DPDK with vfio before running the forwarder

Both fail with the error I sent above.

EDIT:

I didn't find any examples with external VPP, so that is why I tried the approaches above

yuraxdrumz commented 2 years ago

I just tried attaching another NIC to one of my k8s nodes, created an AF_PACKET interface, set its IP to the one from AWS console and added the ip as the NSM_TUNNEL_IP and I still get the same error.

Can you explain what I am missing here? Would appreciate the help.

edwarnicke commented 2 years ago

Which external VPP version are you running?

yuraxdrumz commented 2 years ago

ghcr.io/edwarnicke/govpp/vpp:v22.06-release

yuraxdrumz commented 2 years ago

Sometimes I feel so stupid.

Instead of providing /var/run/vpp/external/api.sock I passed /var/run/vpp/external/cli.sock. Forwarder went up and then I tried running everything on 1 node:

VPP
Forwarder
NSE
NSC

Everything works. Now I will modify forwarders to use the 2nd NIC ip, as currently I can only pass NSM_TUNNEL_IP, which in my case is different for each forwarder, and then I will try to run the NSE on that node.

Running VPP as a daemon-set seems to be working.

yuraxdrumz commented 2 years ago

@edwarnicke

With kernel2kernel example everything worked, but with kernel nsc to memif nse it fails because memif interface is not created. This is the error I see in my nse vpp - unknown message: memif_socket_filename_add_del_v2_34223bdf: cannot support any of the requested mechanism

I saw #519, which removed the need for external vpp option, but I don't follow how is this supposed to work in the first place. I understand memif needs a role=server, and role=client.

Now, in my use case, my forwarder is connected to the external vpp socket and both the forwarder and the vpp pods run on hostNetwork. My kernel nsc and vpp nse run in separate ns namespaces (without hostNetwork).

Is it possible to see an example of an external vpp yaml with a memif interface?

Thanks

edwarnicke commented 2 years ago

Sometimes I feel so stupid.

Instead of providing /var/run/vpp/external/api.sock I passed /var/run/vpp/external/cli.sock. Forwarder went up and then I tried running everything on 1 node:

Not stupid. I don't believe in stupid users, I believe in doc bugs :) If you could let me know what you think might have precluded this misadventure in terms of doc improvements, I'd love that :)

edwarnicke commented 2 years ago

ghcr.io/edwarnicke/govpp/vpp:v22.06-release

Ah... OK... that's not going to work at this moment because there is a patch missing from it for using abstract sockets for memif (if you are interested, I can explain the ins and outs of why).

Could you try using:

v22.06-rc0-147-g1c5485ab8

yuraxdrumz commented 2 years ago

@edwarnicke

Thanks for the help, it worked.

I tried a couple of configurations:

VPP daemon set with forwarder and nse that both use it - with the patch I see the the nse can create the memif server, but forwarder still fails with memif_socket_filename_add_del_v2_34223bdf.
Running forwarder with its own vpp instance and the nse with the external VPP, which works fine.

After seeing some issues with the cleanup of the interfaces while using the external VPP, I just added the /dev/vfio mount to the nse and used a local instance of vpp. It is easier to manage it and the healing / cleanup process is easier as well.

At the end of the day, I was wrong thinking 1 VPP is easier to manage, changed back to local vpp instances.

Thanks

yuraxdrumz commented 2 years ago

Sometimes I feel so stupid. Instead of providing /var/run/vpp/external/api.sock I passed /var/run/vpp/external/cli.sock. Forwarder went up and then I tried running everything on 1 node:

Not stupid. I don't believe in stupid users, I believe in doc bugs :) If you could let me know what you think might have precluded this misadventure in terms of doc improvements, I'd love that :)

I think verifying we are trying to connect to the api.sock by running some initial command, like a show int and if it fails, catch it and add a warning in addition to the error message that returns from govpp.

An example would be:

Original message

time="2022-08-22T14:03:15Z" level=info msg="Decoding sockclnt_create_reply failed: panic occurred during decoding message sockclnt_create_reply: runtime error: slice bounds out of range [10:0]" logger=govpp/socketclient

Add warning below the message from the govpp wrapper

time="2022-08-22T14:03:15Z" level=warn msg="sending command to /var/run/vpp/external/cli.sock failed, make sure you are trying to connect to the api.sock" logger=govpp/...

edwarnicke commented 2 years ago

@edwarnicke

Thanks for the help, it worked.

I'm a little confused... it sounds above like v22.06-rc0-147-g1c5485ab8 worked, but then below you talk about it not working...

I tried a couple of configurations:

VPP daemon set with forwarder and nse that both use it - with the patch I see the the nse can create the memif server, but forwarder still fails with memif_socket_filename_add_del_v2_34223bdf.

Was this with v22.06-rc0-147-g1c5485ab8 ?

Running forwarder with its own vpp instance and the nse with the external VPP, which works fine.

After seeing some issues with the cleanup of the interfaces while using the external VPP,

Could you file a bug on those issues... we should be cleaning up those interfaces on restart of a forwarder using an external VPP... I'm not entirely sure we try to do that with an NSE using an external VPP.

I just added the /dev/vfio mount to the nse and used a local instance of vpp. It is easier to manage it and the healing / cleanup process is easier as well.

Ah... so external VPP is intrinsically more complex (coordination needed)... its super useful for cases where you want to increase performance.

I'd love to hear more about the details of what you are doing with vfio :)

I'd also be quite interested in what you are ultimately trying to achieve.

At the end of the day, I was wrong thinking 1 VPP is easier to manage, changed back to local vpp instances.

Thanks

yuraxdrumz commented 2 years ago

@edwarnicke

I meant the VPP abstract socket feature works, but connecting an NSC to an NSE with both the NSE and the forwarder using the same VPP fails with the same error memif_socket_filename_add_del_v2_34223bdf. The server side (nse) memif is created after running the VPP daemon-set with the image you provided, but the client side memif (forwarder) returns the error above although it uses the same VPP.

I will recreate the errors over the weekend and open up an issue with all the necessary information and logs regarding the single VPP use case.

Regarding the why I am doing this,

We are trying to use NSM at my company for various use cases, one of them is the ability to allow client VPNs to connect to the internet and other branches of their offices securely via chaining security functions (nsm composition).

Before jumping to all the chaining and various other use cases, I wanted to cover internet access and have a reliable benchmark between 1 NSC to 1 NSE, whether its on same machine or different machines to know the potential of the traffic I can push, before adding any other components in-between.

In the process, I created an NSE, which acts as the internet gateway. To gain internet access from the NSE, I set it as hostNetwork, created a VETH pair and added an AF_PACKET interface and configured it to go to the internet via the default gateway.

These are my findings so far regarding simple use case of 1 NSC -> 1 NSE that goes to internet:

EKS 1.22 on AWS c5n.2xlarge nodes

With all the default NSM configurations, 1 core to the NSE, 1 core to each forwarder and 1 core to the NSC
- Performance was in the region of XXX kbps.
- I noticed the ping between 2 memifs took 80ms, which is why I suspected the CPU, so I raised the NSE to have 3 cores and configured the local VPP to have 1 main core and 2 workers, raised the forwarders to have 3 cores as well with the same VPP config above, raised the NSC to have 3 cores
- After that, ran speedtest-cli and got 1-1.4Gbps download and 0.6-1Gbps upload.
I decided to add a 2nd NIC with an elastic ip to each machine, installed DPDK, bounded the NIC with vfio-pci driver and mounted /dev/vfio to the nse.
- Afterwards, I ran the speedtest-cli again, which got me 1.6-2.4Gbps download and 1.2-1.8Gbps upload.
Next up, I thought that if I can add all data plane components with vpp + dpdk, I will get maximum performance, assuming I have enough cores. Now, the issue with running separate vpp's is that as soon as you bind 1 vpp + dpdk to a NIC, you can't bind it anymore and reuse it. The option of attaching a NIC per forwarder + every NSE I will need seems too expensive, not only cost wise, but also cpu wise. Due to the polling mode driver I will lose 1 core per rx queue on each instance vpp + dpdk. So I tried to run a single VPP as a daemon set with 1 dpdk vfio bound NIC per node and run forwarder and all other vpp related workloads on it. I encountered the abstract socket issue, which I didn't even realize, until you helped me with it, but I saw a lot of interfaces are created and are not cleaned, so I discarded the idea and stuck with only mounting /dev/vfio to the nse that acts as the internet gateway

Hope this answers your questions.

Thanks

networkservicemesh / deployments-k8s

External VPP Issue #7110