Open wangjun0728 opened 4 weeks ago
Yes, that is correct behavior for the userspace to not delete the DP on restart. The probe may need some enhancement to cover this case, it seems.
Yes, I understand that this probing is likely necessary to maintain compatibility with older kernel versions that don't support certain features. However, it appears that this probing can lead to the entire meter initialization failing when the dp is not deleted and only the user-space process is restarted, which in turn prevents the flow tables from being properly provisioned.
After constructing the meter entries using the script below, killing the OVS process and restarting it will result in the following error.
#!/bin/bash
start_meter=`ovs-ofctl dump-meter br-int -O Openflow13|grep meter=|wc -l`
#ovs-ofctl dump-meter br-int -O Openflow13|grep meter=|wc -l
start_meter=$((start_meter + 1))
end_meter=1019
for (( meter=$start_meter; meter<$end_meter; meter++ ))
do
ovs-ofctl add-meter br-int meter=$meter,kbps,band=type=drop,rate=1000000 -O Openflow13
echo "Added meter $meter"
done
log error:
cat /var/log/openvswitch/ovs-vswitchd.log |grep meter -i
2024-08-22T03:33:13.682Z|00036|dpif_netlink|INFO|dpif_netlink_meter_transact OVS_METER_CMD_SET failed
2024-08-22T03:33:13.682Z|00037|dpif_netlink|INFO|dpif_netlink_meter_transact get failed
2024-08-22T03:33:13.682Z|00038|dpif_netlink|INFO|The kernel module has a broken meter implementation.
2024-08-22T03:33:15.303Z|00235|connmgr|INFO|br-int<->unix#73: sending OFPMMFC_INVALID_METER error reply to OFPT_METER_MOD message
2024-08-22T03:33:15.303Z|00236|connmgr|INFO|br-int<->unix#73: sending OFPMMFC_INVALID_METER error reply to OFPT_METER_MOD message
2024-08-22T03:33:15.303Z|00237|connmgr|INFO|br-int<->unix#73: sending OFPMMFC_INVALID_METER error reply to OFPT_METER_MOD message
2024-08-22T03:33:15.303Z|00238|connmgr|INFO|br-int<->unix#73: sending OFPMMFC_INVALID_METER error reply to OFPT_METER_MOD message
2024-08-22T03:33:15.303Z|00239|connmgr|INFO|br-int<->unix#73: sending OFPMMFC_INVALID_METER error reply to OFPT_METER_MOD message
2024-08-22T03:33:15.303Z|00240|connmgr|INFO|br-int<->unix#73: sending OFPMMFC_INVALID_METER error reply to OFPT_METER_MOD message
2024-08-22T03:33:15.303Z|00241|connmgr|INFO|br-int<->unix#73: sending OFPMMFC_INVALID_METER error reply to OFPT_METER_MOD message
2024-08-22T03:33:15.303Z|00242|connmgr|INFO|br-int<->unix#73: sending OFPMMFC_INVALID_METER error reply to OFPT_METER_MOD message
2024-08-22T03:33:15.303Z|00243|connmgr|INFO|br-int<->unix#73: sending OFPMMFC_INVALID_METER error reply to OFPT_METER_MOD message
2024-08-22T03:33:15.303Z|00244|connmgr|INFO|br-int<->unix#73: sending OFPMMFC_INVALID_METER error reply to OFPT_METER_MOD message
2024-08-22T04:10:07.071Z|00036|dpif_netlink|INFO|dpif_netlink_meter_transact OVS_METER_CMD_SET failed
2024-08-22T04:10:07.071Z|00037|dpif_netlink|INFO|dpif_netlink_meter_transact get failed
2024-08-22T04:10:07.071Z|00038|dpif_netlink|INFO|The kernel module has a broken meter implementation.
I encountered an issue with meter initialization in the ovs-kernel. The root cause of the problem is that when we restart the user-space process, we didn't remove the dp (datapath), unload, and reload the kernel. However, the existing meter entries in the dp caused a hash collision, which resulted in the failure of meter initialization.
In lookup_meter, the search is performed based on the hash, but it also compares the meter ID. So, in the case of a conflict, it directly returns NULL without overwriting the existing meter, leading to errors when initially provisioning the two meters.
When probing from the user space, the meter conflict will be detected, causing the operation to fail.
So, is this approach of not deleting the dp and not reloading the kernel incorrect? I couldn't find any official documentation on this on the website.