networkservicemesh / deployments-k8s

Apache License 2.0
42 stars 34 forks source link

questions: dpdk plugin / VCL / service graph topology #9777

Closed barby1138 closed 12 months ago

barby1138 commented 1 year ago

Hi

I have several questions:

  1. Why dpdk_plugin in VPP fwder is disabled by default? Its easy to bing it up but Im not sure it's supported well by NSM - is it part of concept?

  2. I have 2+ clusters of same type services chained. Each cluster has 1 node with 2 VF dpdk interaface (in/out). Also graph runs in both directions (uplink/downlink) so every client is service. I want to apply different connection config profiles

Ex.

I use nse-remote-vlan and vpp forwarder to select between DPDK VFs. So I have such chain

|----> nse-remote-vlan-cluster1....nse-remote-vlan-cluster2 <---| | ................................................................................................ | cli1-type1 <->...VF1 ...<->...VF1... <-> ............................cli1-type2 cli2-type1.........................................................................cli2-type2

I want to connect clients in different maner dynamical y depend on load etc. using routing rules in forwarders Is this topology correct? If yes - how nse-remote-vlans provide routes between clusters? should it be additional configured in forwarder-vpp? If no what will you suggest? .

  1. I connect my services using kernel but I want to use VCL. Is there any additional configuration needed in fwders? is there plans to have separate vcl mechanism for connection.
edwarnicke commented 1 year ago

@barby1138 At the time that it was disabled, DPDK tended to break catastrophically in when run in Pods. It makes a huge number of presumptions about owning the whole box, and if any of them are not met it simply crashes (and takes VPP with it). We haven't revisited it lately, maybe it's better, but that's why we by default disabled it for things like the forwarder and our NSCs that are using memif.

It would be quite easy to re-enable it for a specific NSE that is binding to a NIC with DPDK if one could sort out the conflict between DPDK and K8s.

barby1138 commented 1 year ago

Hi Ed, Think it was long time ago. I use 22/23 vpp / dpdk in production in k8s env during the last year and it works perfect like a charm. I use HP 1G enabled. So it worth considering to bring it back as there is not much sense in using memif and then fall back to kernel to pass traffic between nodes / clusters. This is about forwarders only - per my opinion - no need to use it in NSC or NSE. Regarding VCL - if we work in scope of socket not packet - it's better to use it not memif - also no need to bring vpp in application pods. IMHO this is definitely worth bringing as well.

Thank you. Have a nice day!!!

edwarnicke commented 1 year ago

@barby1138 Question: are you hand setting the ulimits on locked memory for your cluster? I ask, because I specifically remember one of the issues being that DPDK insisted on locking a certain amount of memory (even though most K8s clusters have no swap). You could hack around it by fixing the ulimits on all of your nodes... but when someone simply deploys in a vanilla K8s environment without tweaking it had a very high probability of blowing up.

edwarnicke commented 1 year ago

@barby1138 A good quick test would be: does DPDK work out of the box in a Kind env? That's about as vanilla as things get.

barby1138 commented 1 year ago

Hi Yes - sure. The only requisite is to have 1G HP configured at setup. And of cource every setup has it's specific DPDK device pci address. I can share the patch-forwarder-vpp.yaml I use for it - if interested.

Good thing about fwder vpp is that if I put my startup.conf file to mounted /etc/vpp/helper it uses it not the default one. So generally - DPDK could be enabled if needed in the existing solution. Just wondered why it's disabled by default.

What about vcl?

edwarnicke commented 1 year ago

@barby1138 I'd love to see your patch-forwader-vpp.yaml :)

And glad you found the 'stomping' feature for startup.conf useful :) We built it that way intentionally because we were certain folks would encounter times and places they needed to customize.

WRT vcl... I'm curious what you are thinking. I'm generally a vcl fan.

barby1138 commented 1 year ago

Hi Ed,

patch-forwarder-vpp.yaml is the following

---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: forwarder-vpp
spec:
  template:
    spec:
      containers:
        - name: forwarder-vpp
          resources:
            requests:
              hugepages-1Gi: 2Gi
              memory: 2Gi
            limits:
              hugepages-1Gi: 2Gi
              memory: 2Gi
          volumeMounts:
            - name: etcvpp
              mountPath: /etc/vpp
            - name: hugepage
              mountPath: /dev/hugepages/
            - name: vpp
              mountPath: /var/run/vpp
      volumes:
        - name: etcvpp
          hostPath:
            path: /etc/vpp
            type: Directory
        - name: hugepage
          emptyDir:
            medium: HugePages
        - name: vpp
          hostPath:
            path: /var/run/vpp
            type: DirectoryOrCreate

Just make sure 1G hugepages are configured at setup. I do it via grub

add /etc/vpp/helper/vpp.conf with enabled DPDK plugin / device

Regarding VCL I ll write you back later - need to gather thoughts :)

Have a nice day!!!

edwarnicke commented 1 year ago

Just make sure 1G hugepages are configured at setup. I do it via grub

Yeah... we don't always have that much control over the environments we deploy into. While I completely concur you want things like Hugepages (and core pinning etc) for optimal performance... our out of the box configs are optimized for two things:

  1. The always work out of the box with little to no effort
  2. To make it easy for folks to customize them when they can do better to optimize performance.

Your config above is a great example of principle (2) :)

barby1138 commented 1 year ago

Hi Ed,

I took some holidays for a couple of days.

so VCL:

I will try to describe how I see vcl feature is represented in nsm. I will try to describe the simplest setup but with look for scalability.

  1. I think it should be one more mechanism for connection like kernel and memif. Supported by fwder-VPP
  2. To have it enabled in fwder-VPP it should be configured respectively - with session enabled / etc
  3. So in client we request vcl://service-name There is no specific interface in client / nse for VCL but it's bound to interface inside fwder-VPP (vxlan-tunnel). IPAM assign IP addresses to correspondent interfaces used for tunneling inside fwder-VPP And I propose to inject this info into client/nse just like we inject tap for kernel case. It could be config-file in mounted directory

Ex. I ll add some reference bash code snapshot how I configure VCL manually for now

VPP_POD1=$(kubectl --kubeconfig=$KUBECONFIG1 get pod  -l app=forwarder-vpp -n nsm-system  -o jsonpath="{.items[0].metadata.name}")
echo $VPP_POD1

VPP_POD2=$(kubectl --kubeconfig=$KUBECONFIG2 get pod  -l app=forwarder-vpp -n nsm-system  -o jsonpath="{.items[0].metadata.name}")
echo $VPP_POD2

PCIDEV1=vxlan_tunnel1
VCL_IP_ADDR1=172.17.1.8/16

PCIDEV2=vxlan_tunnel1
VCL_IP_ADDR2=172.17.1.9/16

kubectl --kubeconfig=$KUBECONFIG1 -it exec $VPP_POD1 -n nsm-system -- vppctl set int ip addr $PCIDEV1 $VCL_IP_ADDR1
kubectl --kubeconfig=$KUBECONFIG1 -it exec $VPP_POD1 -n nsm-system -- vppctl sh int addr

kubectl --kubeconfig=$KUBECONFIG2 -it exec $VPP_POD2 -n nsm-system -- vppctl set int ip addr $PCIDEV2 $VCL_IP_ADDR2
kubectl --kubeconfig=$KUBECONFIG2 -it exec $VPP_POD2 -n nsm-system -- vppctl sh int addr

CLUSTER2_IP=<my.cluster2.IP>
SERVICE_NAME="vcl_data"

APP_POD1=$(kubectl --kubeconfig=$KUBECONFIG1 get pod  -l app=app-1 -n ns-floating-kernel2ethernet2kernel -o jsonpath="{.items[0].metadata.name}")
CONF_NAME1=$APP_POD1$SERVICE_NAME
echo 172.17.1.8/16 > $CONF_NAME1
cp $CONF_NAME1 /etc/vpp
rm -f $CONF_NAME1

NSE_POD1=$(kubectl --kubeconfig=$KUBECONFIG2 get pod  -l app=nse-kernel-1 -n ns-floating-kernel2ethernet2kernel -o jsonpath="{.items[0].metadata.name}")
CONF_NAME2=$NSE_POD1$SERVICE_NAME
echo 172.17.1.9/16 > $CONF_NAME2
scp $CONF_NAME2 root@$CLUSTER2_IP:/etc/vpp
rm -f $CONF_NAME2

Summary: The main idea - VCL connection is built just like kernel - but there is no taps. Instead of it fwder-VPP interfaces are configured and this info is injected to client/nse

May be you have better ideas how to enable VCL with even less effort.

Have a nice day!!!

barby1138 commented 1 year ago

Hi guys,

Do we have any progress here? Should I open this as feature request?

Jave a nice weekend!!!

edwarnicke commented 1 year ago

@barby1138 Bear with me while I try to swap my VCL knowledge back into my brain :) If memory serves, VCL is 'setup' by a user by sending messages over a unix file socket, correct?

In which case it would work very much like memif. I like your idea of a vcl mechanism type. So maybe something like:

vcl://${service-name}/${optional requested filename of unix file socket}

Thoughts?

barby1138 commented 1 year ago

Hi Ed Glad to hear from you :) No ${optional requested filename of unix file socket} is not needed. VCL atractor (client) needs vcl configuration with socket, queues, secrets, etc. - but it's client logic not related to NSM - per my vision. Also control socket is shared to VCL client via mounted folder from fwder. So the only thing is needed from VPP fwder is to share control socket and to enable sessions in startup.conf

Described good here: https://www.envoyproxy.io/docs/envoy/latest/configuration/other_features/vcl refer to "Installing and running VPP/VCL"

barby1138 commented 1 year ago

yes, technical y it's like memif but in VCL we work with socket not packets and no need to bring VPP to clients just some libs - but it's client responsibility. For test nsc / nse we ll need it - I can help with it. I have working manually configured setup already

denis-tingaikin commented 12 months ago

@barby1138 if you have something working feel free to open PR into deployments repo with your example ;)

denis-tingaikin commented 12 months ago

Another option is that you could also put your configurations here, and we will add examples on our side.

barby1138 commented 12 months ago

Hi Denis

I will reply in new opened Feature: VCL #10023

thanks