rhkdump / kdump-utils

Kernel crash dump collection utilities
GNU General Public License v2.0
3 stars 8 forks source link

Support setting up Openvswitch (Ovs) Bridge network #2

Open coiby opened 3 months ago

coiby commented 3 months ago

Resolves: https://issues.redhat.com/browse/RHEL-33465

This patch supports setting up an Ovs bridge in kdump initrd. An Ovs bridge is similar to a classic Linux bridge but we use ovs-vsctl to find out the Ethernet device (having the MAC address as the bridge) added to an Ovs bridge. Once we copy all the needed NetworkManager (NM) connection profiles to kdump initrd and all the necessary files, NM will create an Ovs bridge automatically in kdump initrd.

In the case of OpenShift Container Platform (OCP), ovs-configuration.service [1] is responsible for setting up an Ovs bridge. In theory, we can also try to bring up the original physical network interface before ovs-configuration.service. But this approach is cumbersome because it breaks our assumption that we should bring up the same network in kdump intrd as in 1st kernel (establishing the same network in kdump initrd only needs to copy the needed NM connection profiles thus we don't need to learn how different network setup work under the hood).

How to test this patch with the help of configure-ovs.sh?
=========================================================

1. Extract configure-ovs.sh from [2]

2. Install necessary packages for configure-ovs.sh
    dnf install openvswitch -yq
    dnf install NetworkManager-ovs nmap-ncat -yq

    systemctl enable --now openvswitch

    # restart NM so the ovs plugin can be activated
    systemctl restart NetworkManager

3. Assume the network interface used for creating an Ovs bridge is
   "ens2", use configure-ovs.sh to create an Ovs bridge,

    interface=ens2
    mkdir -p /etc/ovnk
    echo $interface > /etc/ovnk/iface_default_hint
    bash configure-ovs.sh OVNKubernetes

4. (Optional) If you want to make the created Ovs bridge survive a
   reboot, simply make the created NM connections created by
   configure-ovs.sh persist,

    cp /run/NetworkManager/system-connections/ovs-* /etc/NetworkManager/system-connections/

If you need to create an Ovs bridge on top of a bonding network, use the
following commands for step 3,

    nmcli con add type bond ifname bond0
    nmcli con add type ethernet ifname eth0 master bond0
    nmcli con add type ethernet ifname eth1 master bond0

    echo bond0 > /etc/ovnk/iface_default_hint
    bash configure-ovs.sh OVNKubernetes

Note

  1. For RHEL, openvswitch3.3 may be installed so we need to get the package name by "rpm -qf /usr/lib/systemd/system/openvswitch.service"
  2. For RHEL9, openvswitch package needs to installed from another repo, cat << 'EOF' > /etc/yum.repos.d/ovs.repo [rhosp-rhel-9-fdp-cdn] name=Red Hat Enterprise Linux Fast Datapath $releasever - $basearch cdn baseurl=http://rhsm-pulp.corp.redhat.com/content/dist/layered/rhel9/$basearch/fast-datapath/os/ enabled=1 gpgcheck=0 EOF

    dnf install openvswitch3.3 -yq

[1] https://github.com/openshift/machine-config-operator/blob/master/templates/common/_base/units/ovs-configuration.service.yaml [2] https://github.com/openshift/machine-config-operator/blob/master/templates/common/_base/files/configure-ovs-network.yaml

Signed-off-by: Coiby Xu coxu@redhat.com

coiby commented 3 months ago

Hi @jbtrystram, can you take a look at this patch set? Thank you! Btw, if this PR gets merged, https://github.com/openshift/machine-config-operator/pull/4213 will no longer be needed.

jbtrystram commented 2 months ago

Hi @jbtrystram, can you take a look at this patch set? Thank you!

TBH i don't know this code at all so a review would be unhelpful..

Btw, if this PR gets merged, openshift/machine-config-operator#4213 will no longer be needed.

Thanks for letting me know ! I'll update the MCO afterwards then

coiby commented 2 months ago

Hi Coiby,

Hi Phillipp,

personally I would merge the two commits as I don't see much value in having the first one. But I don't have a strong opinion on that.

Previously, I used two commits because it may be easier for the OpenShift team to review. But since configure-ovs.sh contains all the info I need, now I squash two commits together in the new version.

This new version switches to a new approach i.e. we now try to bring up the same network in kdump initrd as in the 1st kernel. The reason is I find a Ovs bridge can be created on top of another network like bonding network. And it's cumbersome to support this case using the previous approach.

If possible I would like to simplify the second commit. Having ovs_phy_if as global variable and the extra kdump_install_ovs_nmconnection function seems over complicated to me on first view.

Thanks Philipp

coiby commented 2 months ago

Hi @jbtrystram, can you take a look at this patch set? Thank you!

TBH i don't know this code at all so a review would be unhelpful..

configure-ovs.sh has all the info I need now, thanks anyways!

Btw, if this PR gets merged, openshift/machine-config-operator#4213 will no longer be needed.

Thanks for letting me know ! I'll update the MCO afterwards then

You are welcome!

jbtrystram commented 1 month ago

Anything blocking that PR ? We've have had reports of issues that could be fixed by this :)

kenneth-dsouza commented 4 weeks ago

Hello team, any update here?

coiby commented 1 week ago

Hi @jbtrystram and @kenneth-dsouza,

Currently, the progress is blocked by code review. It seems you Cores team asks someone else to review patches about Openvswitch network. Can I ask this person to help this PR as well?

jbtrystram commented 4 days ago

Currently, the progress is blocked by code review. It seems you Cores team asks someone else to review patches about Openvswitch network. Can I ask this person to help this PR as well?

I am not sure who you are referring to ? There is no harm in asking for a review ! I forwarded this to the openvswitch folks :)

JM1 commented 3 days ago

@coiby Could you please explain, how your code sets up the Open vSwitch bridges in the initrd? What bridges get created? What OCP release did you test your changes with?

What did nmcli -t -f device,filename connection show --active (in kdump_install_nmconnections()) return in your tests?

JM1 commented 3 days ago

Where is ovs-system (in kdump_collect_netif_usage()) coming from?

How does your code restore OCP's br-ex bridge?

coiby commented 16 hours ago

@coiby Could you please explain, how your code sets up the Open vSwitch bridges in the initrd? What bridges get created? What OCP release did you test your changes with?

Hi @JM1! Thanks for raising this question. The code simply copy the NetworkManager connection files and the supporting files to kdump initrd and then NM will set up the same Open vSwitch bdrige to the 1st kernel automatically. I forgot which OCP version I tested early. But later I realize this not limited to OCP so I test it in RHEL9 and Fedora40. If you are interested in more details, I've included the info about how I set up a testing environment in the commit message.

What did nmcli -t -f device,filename connection show --active (in kdump_install_nmconnections()) return in your tests?

ovs-if-br-ex        de17dd7e-dd41-4865-a99e-60077f0837ae  ovs-interface  br-ex  
br-ex               c5662992-05c0-4b05-9302-a554c705ef6a  ovs-bridge     br-ex  
ovs-if-phys0        ddcae60d-7d24-46b2-86bc-c326b6688cc9  ethernet       eno1   
ovs-port-br-ex      3ffa5d8b-5ae6-45c6-9fd6-04e5f4b2eeaa  ovs-port       br-ex  
ovs-port-phys0      aadfdac7-fb7f-4aa0-9180-0eb64263f722  ovs-port       eno1   

Btw, I've also tested the case of Open vSwitch bridge over a bonding network.

coiby commented 16 hours ago

Where is ovs-system (in kdump_collect_netif_usage()) coming from?

I can't find ovs-system. Or do you mean the kdump_is_ovs_bridge function?

How does your code restore OCP's br-ex bridge?

This is done by NM. I simply copy all the needed NetworkManager connection files and the supporting files to kdump initrd.

JM1 commented 14 hours ago

Where is ovs-system (in kdump_collect_netif_usage()) coming from?

I can't find ovs-system. Or do you mean the kdump_is_ovs_bridge function?

Your previous commit had this line:

_save_kdump_netifs "ovs-system"

But it is gone in later commits. Hence, solved by removal 😄

How does your code restore OCP's br-ex bridge?

This is done by NM. I simply copy all the needed NetworkManager connection files and the supporting files to kdump initrd.

Ok. If I understand your code correctly, in kdump_collect_netif_usage it will collect the netdev (br-ex) that would be used when accessing the remote fs (e.g. via ssh). It would also identify the corresponding physical NIC such as enp1s0 with the help of kdump_setup_ovs and ovs_find_phy_if and add both (br-ex and enp1s0) to unique_netifs. Later, in kdump_install_nmconnections it would retrieve all NM connections associated with those two devices. I did not test it but the approach seems reasonable.

However, I am wondering why you do not simply try to copy and restore all NetworkManager connections? Wouldn't that reduce the amount of code here dramatically?