ovn-org / ovn-kubernetes

A robust Kubernetes networking platform
https://ovn-kubernetes.io/
Apache License 2.0
811 stars 338 forks source link

ovn-kubernetes in a vm-s with undrecloud scenario #4540

Open zlangi opened 1 month ago

zlangi commented 1 month ago

What would you like to know?

Hello people, first of all, thanks for this very cool project, it’s really amazing! We would like to implement it for our system but since the documentation ovn-kubernetes is not that extensive, I would like to ask if what we are trying to achieve is doable or not with ovn-kubernetes.

We have a setup which is heavily relying on kubevirt and virtual machines, namely we have 300+ vm-s in our setup. We currently use calico plus an in house developed CNI that is giving snat to the virtual machines as well as managing and setting up the briges. We have a so called under and overcloud separation to separate the systems, the physical servers form an undercloud and the vm-s form an overcloud. If you look at the attached diagram I created, you see a backup bridge, ceph bridge, bgp peering bridge and a bridge where the vm-s are wired into through veth pairs so they can reach each other and the ceph mons. These bridges are placed on the host machines that are forming the undercloud. The routing table is present on the host machines also so the vm-s that are belonging to the overcloud and having their own cni (calico) are able to mount their pvc-s through ceph-csi by using their default gateways. Most of the vm-s are having one single interface if their role is a worker. In that case they only have to access the k8s cluster and the storage via ceph-csi. With a second interface on selected vm-s(ingress or backup), a given vm is attached to the backup bridge or the bgp peering bridge so the vms that are dedicated to this function can access to their backups via their secondary interface or exchange BGP peering info via metallb on the peering bridge.

The setup works, but we don’t like the fact that all the ingress traffic coming from the internet is hitting the bridge on the host machine therefore all the traffic has to be processed there before entering to the dedicated vm on the undercloud to get processed once more. We were thinking about maybe with ovn-kubernetes we could feed this traffic directly into openvswitch instead. Hope I described the setup well and now I can go to the part that I tried.

As far as I seen, your default setup is the following for the vm-s is this: (couldn’t find any other docs so I will refer to openshift) https://docs.openshift.com/container-platform/4.15/virt/vm_networking/virt-connecting-vm-to-default-pod-network.html What I don’t really like about this method is that kubevirt will use snat so since most of our vm-s got only one single interface, the storage IO will be passed through snat which is again CPU cycles. One workaround could be to have a second, dedicated interface for storage which could be a secondary network, but could be simpler to stick to one interface. I was looking more into the openshift documentation and seen this: https://docs.openshift.com/container-platform/4.15/virt/vm_networking/virt-connecting-vm-to-ovn-secondary-network.html It’s also present in your documentation: https://ovn-kubernetes.io/features/multiple-networks/multi-homing/#configuring-secondary-networks

This one can do two types, encapsulated (layer2) what we don’t need since we already got vxlan encapsulation between our switching infra or non-encapsulated (localnet). As I was browsing the openshift ovn-kubernetes documentation I realised, secondary networks meant to be only east-west traffic only and no north and south, not possible to do snat with them therefore they are not a match, however the concept having vlan tagged secondary networks seen here was good: https://kubevirt.io/2023/OVN-kubernetes-secondary-networks-localnet.html Besides the lack of snat functionality for the secondary networks, I realised there must be special steps done on the ovs side which is according to the docs can be done via the nmstate operator: https://github.com/nmstate/kubernetes-nmstate which could work, but we use ubuntu unfortunately as our platform and networkmanager is not used on ubuntu so nmstate operator is out of the game. Is there any other way you think possible to do these ovs changes for the secondary network on an automated way?

After all these details, do you see it realistic to implement ovn-kubernetes with our setup so we could benefit from the openvswitch and the hardware offload capabilities since we got Mellanox cards? Many thanks!

setup(1)

Anything else we need to know?

No response

tssurya commented 1 month ago

Hi @zlangi : Sorry about the bad docs.. we are working on making them better :) This is a really nice setup you have here, if you are free would you like to come to our upstream meeting and get support for this issue? Linking the agenda doc here: https://docs.google.com/document/d/1ciZS1CajH07THAiH_9j4-6uX4HAJFEoUzwyRdjsQK1k/edit#bookmark=id.d3ss46jks70k (It's 7PM Berlin Time on Monday so might be a bit late for you, but hope you can make it!)

zlangi commented 1 month ago

Yes I will be there :)

tssurya commented 1 month ago

https://ovn-kubernetes.io/governance/MEETINGS/ link for joining the meeting

tssurya commented 1 month ago

This was discussed in upstream meeting, either you can use localnet but you won't get access to any k8s features or second thing is to use podIPs in VMs straight which avoids SNATs like Nvidia uses?

tssurya commented 1 month ago

Assigning this issue to @girishmg , he will work with @zlangi and add him to ovn-org slack as well and will help out set this up.