openvswitch / ovs-issues

Issue tracker repo for Open vSwitch
10 stars 3 forks source link

bond: bond entry statistics overflow #319

Closed WeiweiZhang0227 closed 4 months ago

WeiweiZhang0227 commented 5 months ago

Hi, I encountered an issue while using OVS bond in balance-tcp mode. After performing down and up operations on bond members, the bond entry statistics displayed by ovs-appctl bond/show occured overflow. In addition to the statistics values issue, this also led to longer load balancing time for bond members.

  1. information: ovs version:2.17.2 bond mode: balance-tcp openflow: cookie=0x0, duration=7027.270s, table=0, n_packets=15169077, n_bytes=9334457220, priority=0 actions=NORMAL

  2. ovs-appctl bond/show print:

    
    [root@localhost zzz]# ovs-appctl bond/show
    ---- eobond ----
    bond_mode: balance-tcp
    bond may use recirculation: yes, Recirc-ID : 1
    bond-hash-basis: 0
    lb_output action: disabled, bond-id: -1
    updelay: 0 ms
    downdelay: 0 ms
    next rebalance: 9673 ms
    lacp_status: negotiated
    lacp_fallback_ab: false
    active-backup primary: <none>
    active member mac: 98:a9:2d:c5:00:69(u0)

member u0: enabled active member may_enable: true hash 89: 9007199254740413 kB load hash 219: 9007199254740991 kB load

member u1: enabled may_enable: true hash 141: 9007199254520657 kB load


4. analysis:
After performing down and up operations on bond members, recirc rules are changed, `bond_entry_account( )` function updates bond entry statistics through recirc rules.  `rule_tx_bytes ` < ` entry->pr_tx_bytes` , so  `delta` occurs overflow.

static void bond_entry_account (struct bond_entry *entry, uint64_t rule_tx_bytes)
OVS_REQ_WRLOCK(rwlock) { if (entry->member) { uint64_t delta; delta = rule_tx_bytes - entry->pr_tx_bytes; // delta occurs overflow entry->tx_bytes += delta;
entry->pr_tx_bytes = rule_tx_bytes; } }

5. solution
I try to add `last_pr_rule` in `struct bond_entry` to solve this problem. When  then recirc rule changes, `delta = rule_tx_bytes`, and  `entry->tx_bytes += rule_tx_bytes`.  But  I’m not sure whether the value of `entry->tx_bytes` is correct after the modification. 

Index: ofproto/bond.c

--- ofproto/bond.c (revision 9492) +++ ofproto/bond.c (working copy) @@ -71,6 +71,7 @@

@@ -991,9 +992,12 @@ static void bond_entry_account(struct bond_entry *entry, uint64_t rule_tx_bytes) OVS_REQ_WRLOCK(rwlock) { if (entry->member) { uint64_t delta;

amorenoz commented 5 months ago

Thanks @zwwdbt for the detailed analysis.

You are right there is an overflow in the statistics. It's easy to reproduce.

solution I try to add last_pr_rule in struct bond_entry to solve this problem. When then recirc rule changes, delta = rule_tx_bytes, and entry->tx_bytes += rule_tx_bytes. But I’m not sure whether the value of entry->tx_bytes is correct after the modification.

I am not sure we actually need another field. We already have pr_tx_bytes that tracks the last seen tx_bytes. We just need to reset it to zero when we re-create the rule.

I'll prepare a patch and send it.

amorenoz commented 5 months ago

I've posted a patch that should fix this: https://patchwork.ozlabs.org/project/openvswitch/patch/20240209070642.2412417-1-amorenoz@redhat.com/

Would you kindly test it and report your feedback to the list?

igsilya commented 4 months ago

Should be fixed with commit https://github.com/openvswitch/ovs/commit/436aba68d52891fb5775ec7651282ccf9d04176b .