Open frct1 opened 2 months ago
CC some folks who might have related experience on table sizes. @igsilya @dceara Folk at OpenStack community sent link to this patchwork that shows about group and meter tables not limited as 16bit.
How many meters do you have configured in OVS? You may also run ovs-ofctl -OOpenFlow15 meter-features br-int
to see how many meters your datapath supports. For the kernel datapath, the value is dynamic and depends on how much RAM the system has and some other factors, IIRC, but it's capped at 200K. For userspace datapath it is limited to 256K.
How many meters do you have configured in OVS? You may also run
ovs-ofctl -OOpenFlow15 meter-features br-int
to see how many meters your datapath supports. For the kernel datapath, the value is dynamic and depends on how much RAM the system has and some other factors, IIRC, but it's capped at 200K. For userspace datapath it is limited to 256K.
Old hypervisor where QoS and DHCP issue presents:
# ovs-ofctl -OOpenFlow15 meter-features br-int
OFPST_METER_FEATURES reply (OF1.5) (xid=0x2):
max_meter:0 max_bands:0 max_color:0
band_types: 0
capabilities:
# ovs-ofctl -O OpenFlow15 dump-meters br-int | grep "meter" | wc -l
0
Fresh provisioned hypervisor with OVN (no QoS or DHCP issue observed):
# ovs-ofctl -OOpenFlow15 meter-features br-int
OFPST_METER_FEATURES reply (OF1.5) (xid=0x2):
max_meter:200000 max_bands:1 max_color:0
band_types: drop
capabilities: kbps pktps burst stats
#ovs-ofctl -O OpenFlow15 dump-meters br-int | grep "meter" | wc -l
1544
1544 is nearly close to a total port number (774) * 2 created in OpenStack because QoS is configured for ingress and egress as well.
Versions are the same.
OK. So, your issue is max_meter:0
. It means your datapath (kernel?) doesn't support meters, or for some reason ovs-vswitchd
thinks that the datapath doesn't support meters. What is your kernel version? Also, what does ovs-appctl dpif/show-dp-features br-int
show? Are there any errors/warnings related to meters in the ovs-vswitchd.log
?
Kernel 5.15 is used across all hypervisors: 5.15.0-107-generic
and 5.15.0-122-generic
.
what does ovs-appctl dpif/show-dp-features br-int show
Fresh provisioned:
Masked set action: Yes
Tunnel push pop: No
Ufid: Yes
Truncate action: Yes
Clone action: Yes
Sample nesting: 10
Conntrack eventmask: Yes
Conntrack clear: Yes
Max dp_hash algorithm: 0
Check pkt length action: Yes
Conntrack timeout policy: Yes
Explicit Drop action: No
Optimized Balance TCP mode: No
Conntrack all-zero IP SNAT: Yes
MPLS Label add: Yes
Max VLAN headers: 2
Max MPLS depth: 3
Recirc: Yes
CT state: Yes
CT zone: Yes
CT mark: Yes
CT label: Yes
CT state NAT: Yes
CT orig tuple: Yes
CT orig tuple for IPv6: Yes
IPv6 ND Extension: No
Where issue observed:
Masked set action: Yes
Tunnel push pop: No
Ufid: Yes
Truncate action: Yes
Clone action: Yes
Sample nesting: 10
Conntrack eventmask: Yes
Conntrack clear: Yes
Max dp_hash algorithm: 0
Check pkt length action: Yes
Conntrack timeout policy: Yes
Explicit Drop action: No
Optimized Balance TCP mode: No
Conntrack all-zero IP SNAT: Yes
MPLS Label add: Yes
Max VLAN headers: 2
Max MPLS depth: 3
Recirc: Yes
CT state: Yes
CT zone: Yes
CT mark: Yes
CT label: Yes
CT state NAT: Yes
CT orig tuple: Yes
CT orig tuple for IPv6: Yes
IPv6 ND Extension: No
Are there any errors/warnings related to meters in the ovs-vswitchd.log ?
Yep, did some grep, there are.
First hypervisor with broken metering feature:
2024-09-13T14:11:21.894Z|379262|coverage|INFO|dpif_meter_set 0.0/sec 0.000/sec 0.0000/sec total: 9658
2024-09-13T14:11:21.894Z|379263|coverage|INFO|dpif_meter_del 0.0/sec 0.000/sec 0.0000/sec total: 8160
2024-09-13T14:19:31.684Z|00032|dpif_netlink|INFO|dpif_netlink_meter_transact OVS_METER_CMD_SET failed
2024-09-13T14:19:31.684Z|00033|dpif_netlink|INFO|dpif_netlink_meter_transact OVS_METER_CMD_SET failed
2024-09-13T14:19:31.684Z|00034|dpif_netlink|INFO|dpif_netlink_meter_transact get failed
2024-09-13T14:19:31.684Z|00035|dpif_netlink|INFO|The kernel module has a broken meter implementation.
2024-09-13T14:44:42.548Z|00032|dpif_netlink|INFO|dpif_netlink_meter_transact OVS_METER_CMD_SET failed
2024-09-13T14:44:42.548Z|00033|dpif_netlink|INFO|dpif_netlink_meter_transact OVS_METER_CMD_SET failed
2024-09-13T14:44:42.548Z|00034|dpif_netlink|INFO|dpif_netlink_meter_transact get failed
2024-09-13T14:44:42.548Z|00035|dpif_netlink|INFO|The kernel module has a broken meter implementation.
Second hypervisor with broken metering:
2024-09-16T20:28:46.386Z|00032|dpif_netlink|INFO|dpif_netlink_meter_transact OVS_METER_CMD_SET failed
2024-09-16T20:28:46.386Z|00033|dpif_netlink|INFO|dpif_netlink_meter_transact OVS_METER_CMD_SET failed
2024-09-16T20:28:46.386Z|00034|dpif_netlink|INFO|dpif_netlink_meter_transact get failed
2024-09-16T20:28:46.386Z|00035|dpif_netlink|INFO|The kernel module has a broken meter implementation.
13 of September is the first day when metering issue has started and probably become broken for some reason
Hello, We running kinda big hypervisors (hundreds of small short-lived VMs) based on OpenStack and started to face issues that DHCP response not being send dhcp offer to tap interface at all (but logs shows that DHCPOFFER has been sent). While starting ovn-controller we always seeing this err log line that probably related to this:
The real weird thing that ovn-controller version is actual and issue should be gone starting one of 2023.* fall releases mentioned here, but it is not
OpenStack deployed using kolla-ansible, master version which ovn-controller is at version 2024.3.2 (info) Versions:
What could be a reason for this?