openvswitch / ovs-issues

Issue tracker repo for Open vSwitch
10 stars 3 forks source link

ovs-kernel meter init error #337

Open wangjun0728 opened 4 weeks ago

wangjun0728 commented 4 weeks ago

I encountered an issue with meter initialization in the ovs-kernel. The root cause of the problem is that when we restart the user-space process, we didn't remove the dp (datapath), unload, and reload the kernel. However, the existing meter entries in the dp caused a hash collision, which resulted in the failure of meter initialization.

In lookup_meter, the search is performed based on the hash, but it also compares the meter ID. So, in the case of a conflict, it directly returns NULL without overwriting the existing meter, leading to errors when initially provisioning the two meters.

static struct dp_meter *lookup_meter(const struct dp_meter_table *tbl,
                     u32 meter_id)
{
    struct dp_meter_instance *ti = rcu_dereference_ovsl(tbl->ti);
    u32 hash = meter_hash(ti, meter_id);
    struct dp_meter *meter;
    meter = rcu_dereference_ovsl(ti->dp_meters[hash]);
    if (meter && likely(meter->id == meter_id))
        return meter;
    return NULL;
}

When probing from the user space, the meter conflict will be detected, causing the operation to fail.

static bool
probe_broken_meters__(struct dpif *dpif)
{
    /* This test is destructive if a probe occurs while ovs-vswitchd is
     * running (e.g., an ovs-dpctl meter command is called), so choose a
     * random high meter id to make this less likely to occur. */
    ofproto_meter_id id1 = { 54545401 };
    ofproto_meter_id id2 = { 54545402 };
    struct ofputil_meter_band band = {OFPMBT13_DROP, 0, 1, 0};
    struct ofputil_meter_config config1 = { 1, OFPMF13_KBPS, 1, &band};
    struct ofputil_meter_config config2 = { 2, OFPMF13_KBPS, 1, &band};

    /* Try adding two meters and make sure that they both come back with
     * the proper meter id.  Use the "__" version so that we don't cause
     * a recurve deadlock. */
    dpif_netlink_meter_set__(dpif, id1, &config1);
    dpif_netlink_meter_set__(dpif, id2, &config2);

    if (dpif_netlink_meter_get(dpif, id1, NULL, 0)
        || dpif_netlink_meter_get(dpif, id2, NULL, 0)) {
        VLOG_INFO("The kernel module has a broken meter implementation.");
        return true;
    }

    dpif_netlink_meter_del(dpif, id1, NULL, 0);
    dpif_netlink_meter_del(dpif, id2, NULL, 0);

    return false;
}

So, is this approach of not deleting the dp and not reloading the kernel incorrect? I couldn't find any official documentation on this on the website.

apconole commented 3 weeks ago

Yes, that is correct behavior for the userspace to not delete the DP on restart. The probe may need some enhancement to cover this case, it seems.

wangjun0728 commented 3 weeks ago

Yes, I understand that this probing is likely necessary to maintain compatibility with older kernel versions that don't support certain features. However, it appears that this probing can lead to the entire meter initialization failing when the dp is not deleted and only the user-space process is restarted, which in turn prevents the flow tables from being properly provisioned.

wangjun0728 commented 3 weeks ago

After constructing the meter entries using the script below, killing the OVS process and restarting it will result in the following error.

 #!/bin/bash
  start_meter=`ovs-ofctl dump-meter br-int -O Openflow13|grep meter=|wc -l`
  #ovs-ofctl dump-meter br-int -O Openflow13|grep meter=|wc -l
  start_meter=$((start_meter + 1))

  end_meter=1019

  for (( meter=$start_meter; meter<$end_meter; meter++ ))
  do
      ovs-ofctl add-meter br-int meter=$meter,kbps,band=type=drop,rate=1000000 -O Openflow13
      echo "Added meter $meter"
  done

log error:

   cat /var/log/openvswitch/ovs-vswitchd.log |grep meter -i
  2024-08-22T03:33:13.682Z|00036|dpif_netlink|INFO|dpif_netlink_meter_transact OVS_METER_CMD_SET failed
  2024-08-22T03:33:13.682Z|00037|dpif_netlink|INFO|dpif_netlink_meter_transact get failed
  2024-08-22T03:33:13.682Z|00038|dpif_netlink|INFO|The kernel module has a broken meter implementation.
  2024-08-22T03:33:15.303Z|00235|connmgr|INFO|br-int<->unix#73: sending OFPMMFC_INVALID_METER error reply to OFPT_METER_MOD message
  2024-08-22T03:33:15.303Z|00236|connmgr|INFO|br-int<->unix#73: sending OFPMMFC_INVALID_METER error reply to OFPT_METER_MOD message
  2024-08-22T03:33:15.303Z|00237|connmgr|INFO|br-int<->unix#73: sending OFPMMFC_INVALID_METER error reply to OFPT_METER_MOD message
  2024-08-22T03:33:15.303Z|00238|connmgr|INFO|br-int<->unix#73: sending OFPMMFC_INVALID_METER error reply to OFPT_METER_MOD message
  2024-08-22T03:33:15.303Z|00239|connmgr|INFO|br-int<->unix#73: sending OFPMMFC_INVALID_METER error reply to OFPT_METER_MOD message
  2024-08-22T03:33:15.303Z|00240|connmgr|INFO|br-int<->unix#73: sending OFPMMFC_INVALID_METER error reply to OFPT_METER_MOD message
  2024-08-22T03:33:15.303Z|00241|connmgr|INFO|br-int<->unix#73: sending OFPMMFC_INVALID_METER error reply to OFPT_METER_MOD message
  2024-08-22T03:33:15.303Z|00242|connmgr|INFO|br-int<->unix#73: sending OFPMMFC_INVALID_METER error reply to OFPT_METER_MOD message
  2024-08-22T03:33:15.303Z|00243|connmgr|INFO|br-int<->unix#73: sending OFPMMFC_INVALID_METER error reply to OFPT_METER_MOD message
  2024-08-22T03:33:15.303Z|00244|connmgr|INFO|br-int<->unix#73: sending OFPMMFC_INVALID_METER error reply to OFPT_METER_MOD message
  2024-08-22T04:10:07.071Z|00036|dpif_netlink|INFO|dpif_netlink_meter_transact OVS_METER_CMD_SET failed
  2024-08-22T04:10:07.071Z|00037|dpif_netlink|INFO|dpif_netlink_meter_transact get failed
  2024-08-22T04:10:07.071Z|00038|dpif_netlink|INFO|The kernel module has a broken meter implementation.