sonic-net / sonic-swss

SONiC Switch State Service (SwSS)
https://azure.github.io/SONiC
Other
170 stars 503 forks source link

[202305][muxorch] Fixing bug with updateRoute and mux neighbors #3188

Closed Ndancejic closed 2 months ago

Ndancejic commented 2 months ago

mux neighbors that were not the configured mux ip were being treated as active.

cherry-pick of: https://github.com/sonic-net/sonic-swss/pull/3187

What I did use isNeighborActive to determine if neighbor is active for updateRoute logic

Why I did it bug was causing crash when neighbor was not the same as the mux configured neighbor

How I verified it added new vstests and ran locally tests passed on master PR tested on 202305 dualtor testbed:

w/out fix:
Jun  9 06:05:46.642266 svcstr-7050-acs-4 NOTICE swss#orchagent: :- setState: [Ethernet12] Set MUX state from active to standby
Jun  9 06:05:46.642559 svcstr-7050-acs-4 NOTICE swss#orchagent: :- nbrHandler: Processing neighbors for mux Ethernet12, enable 0, state 2
Jun  9 06:05:46.642719 svcstr-7050-acs-4 NOTICE swss#orchagent: :- updateRoute: Updating route 11.11.11.11/32 pointing to Mux nexthops 192.168.0.2@Vlan1000,192.168.100.4@Vlan1000
Jun  9 06:05:46.644840 svcstr-7050-acs-4 NOTICE swss#orchagent: :- updateRoute: setting route 11.11.11.11/32 with nexthop 192.168.100.4@Vlan1000 400000000081d
Jun  9 06:05:46.648190 svcstr-7050-acs-4 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.4/32
Jun  9 06:05:46.648287 svcstr-7050-acs-4 NOTICE swss#orchagent: :- disableNeighbor: Neighbor disable request for 192.168.0.4
Jun  9 06:05:46.650488 svcstr-7050-acs-4 NOTICE swss#orchagent: :- removeNeighbor: Removed next hop 192.168.0.4 on Vlan1000
Jun  9 06:05:46.654161 svcstr-7050-acs-4 NOTICE swss#orchagent: :- removeNeighbor: Removed neighbor aa:f4:6e:25:d3:03 on Vlan1000
Jun  9 06:05:46.662224 svcstr-7050-acs-4 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.100.4/32
Jun  9 06:05:46.662224 svcstr-7050-acs-4 NOTICE swss#orchagent: :- disableNeighbor: Neighbor disable request for 192.168.100.4
Jun  9 06:05:46.662224 svcstr-7050-acs-4 ERR swss#orchagent: :- meta_generic_validation_remove: object 0x400000000081d reference count is 1, can't remove
Jun  9 06:05:46.662224 svcstr-7050-acs-4 ERR swss#orchagent: :- removeNeighbor: Failed to remove next hop 192.168.100.4 on Vlan1000, rv:-17
Jun  9 06:05:46.662282 svcstr-7050-acs-4 ERR swss#orchagent: :- handleSaiRemoveStatus: Encountered failure in remove operation, exiting orchagent, SAI API: SAI_API_NEXT_HOP, status: SAI_STATUS_OBJECT_IN_USE

w/fix:
Jun  9 06:03:08.374440 svcstr-7050-acs-3 NOTICE swss#orchagent: :- setState: [Ethernet12] Set MUX state from active to standby
Jun  9 06:03:08.374785 svcstr-7050-acs-3 INFO swss#orchagent: :- stateStandby: Set state to Standby for Ethernet12
Jun  9 06:03:08.375067 svcstr-7050-acs-3 NOTICE swss#orchagent: :- nbrHandler: Processing neighbors for mux Ethernet12, enable 0, state 2
Jun  9 06:03:08.375210 svcstr-7050-acs-3 INFO swss#orchagent: :- updateRoutes: Updating routes pointing to multiple mux nexthops
Jun  9 06:03:08.375553 svcstr-7050-acs-3 NOTICE swss#orchagent: :- updateRoute: Updating route 11.11.11.11/32 pointing to Mux nexthops 192.168.0.2@Vlan1000,192.168.100.4@Vlan1000
Jun  9 06:03:08.375553 svcstr-7050-acs-3 INFO swss#orchagent: :- updateRoute: No Active neighbors found, setting route 11.11.11.11 to point to tun
Jun  9 06:03:08.377446 svcstr-7050-acs-3 INFO swss#orchagent: :- disable: Disabling neigh 192.168.100.4 on Vlan1000
Jun  9 06:03:08.377446 svcstr-7050-acs-3 INFO swss#orchagent: :- updateNextHopRoutes: Route 11.11.11.11/32 is mux multi nexthop route, skipping.
Jun  9 06:03:08.382395 svcstr-7050-acs-3 INFO swss#orchagent: :- addTunnelRoute: Add tunnel route DB 'Vlan1000:192.168.100.4/32'
Jun  9 06:03:08.389626 svcstr-7050-acs-3 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.100.4/32
Jun  9 06:03:08.389626 svcstr-7050-acs-3 NOTICE swss#orchagent: :- disableNeighbor: Neighbor disable request for 192.168.100.4
Jun  9 06:03:08.392097 svcstr-7050-acs-3 NOTICE swss#orchagent: :- removeNeighbor: Removed next hop 192.168.100.4 on Vlan1000
Jun  9 06:03:08.394777 svcstr-7050-acs-3 INFO swss#orchagent: :- decreaseRouterIntfsRefCount: Router interface Vlan1000 ref count is decreased to 93
Jun  9 06:03:08.395137 svcstr-7050-acs-3 INFO swss#orchagent: :- decreaseRouterIntfsRefCount: Router interface Vlan1000 ref count is decreased to 92
Jun  9 06:03:08.395313 svcstr-7050-acs-3 NOTICE swss#orchagent: :- removeNeighbor: Removed neighbor aa:f4:6e:25:d3:03 on Vlan1000
Jun  9 06:03:08.395491 svcstr-7050-acs-3 INFO swss#orchagent: :- disable: Disabling neigh fc02:1000:100::4 on Vlan1000
Jun  9 06:03:08.395645 svcstr-7050-acs-3 INFO swss#orchagent: :- updateNextHopRoutes: No routes found for NH fc02:1000:100::4
Jun  9 06:03:08.395793 svcstr-7050-acs-3 INFO swss#orchagent: :- addTunnelRoute: Add tunnel route DB 'Vlan1000:fc02:1000:100::4/128'
Jun  9 06:03:08.400434 svcstr-7050-acs-3 NOTICE swss#orchagent: :- create_route: Created tunnel route to fc02:1000:100::4/128
Jun  9 06:03:08.400908 svcstr-7050-acs-3 NOTICE swss#orchagent: :- disableNeighbor: Neighbor disable request for fc02:1000:100::4
Jun  9 06:03:08.403559 svcstr-7050-acs-3 NOTICE swss#orchagent: :- removeNeighbor: Removed next hop fc02:1000:100::4 on Vlan1000
Jun  9 06:03:08.406318 svcstr-7050-acs-3 INFO swss#orchagent: :- decreaseRouterIntfsRefCount: Router interface Vlan1000 ref count is decreased to 91
Jun  9 06:03:08.406318 svcstr-7050-acs-3 INFO swss#orchagent: :- decreaseRouterIntfsRefCount: Router interface Vlan1000 ref count is decreased to 90
Jun  9 06:03:08.406318 svcstr-7050-acs-3 NOTICE swss#orchagent: :- removeNeighbor: Removed neighbor aa:f4:6e:25:d3:03 on Vlan1000
Jun  9 06:03:08.409735 svcstr-7050-acs-3 INFO swss#orchagent: :- createMuxAclTable: ACL table IngressTableDrop exists, reuse the same
Jun  9 06:03:08.409735 svcstr-7050-acs-3 NOTICE swss#orchagent: :- MuxAclHandler: Binding port 1000000000005
Jun  9 06:03:08.411635 svcstr-7050-acs-3 INFO swss#orchagent: :- setState: Changed state to standby
prsunny commented 2 months ago

Please link the master PR in description

prsunny commented 2 months ago

Please test on 202305 image and confirm the results. @StormLiangMS for viz

Ndancejic commented 2 months ago

Please test on 202305 image and confirm the results. @StormLiangMS for viz

tested on 202305 dualtor testbed:

w/out fix:
Jun  9 06:05:46.642266 svcstr-7050-acs-4 NOTICE swss#orchagent: :- setState: [Ethernet12] Set MUX state from active to standby
Jun  9 06:05:46.642559 svcstr-7050-acs-4 NOTICE swss#orchagent: :- nbrHandler: Processing neighbors for mux Ethernet12, enable 0, state 2
Jun  9 06:05:46.642719 svcstr-7050-acs-4 NOTICE swss#orchagent: :- updateRoute: Updating route 11.11.11.11/32 pointing to Mux nexthops 192.168.0.2@Vlan1000,192.168.100.4@Vlan1000
Jun  9 06:05:46.644840 svcstr-7050-acs-4 NOTICE swss#orchagent: :- updateRoute: setting route 11.11.11.11/32 with nexthop 192.168.100.4@Vlan1000 400000000081d
Jun  9 06:05:46.648190 svcstr-7050-acs-4 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.4/32
Jun  9 06:05:46.648287 svcstr-7050-acs-4 NOTICE swss#orchagent: :- disableNeighbor: Neighbor disable request for 192.168.0.4
Jun  9 06:05:46.650488 svcstr-7050-acs-4 NOTICE swss#orchagent: :- removeNeighbor: Removed next hop 192.168.0.4 on Vlan1000
Jun  9 06:05:46.654161 svcstr-7050-acs-4 NOTICE swss#orchagent: :- removeNeighbor: Removed neighbor aa:f4:6e:25:d3:03 on Vlan1000
Jun  9 06:05:46.662224 svcstr-7050-acs-4 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.100.4/32
Jun  9 06:05:46.662224 svcstr-7050-acs-4 NOTICE swss#orchagent: :- disableNeighbor: Neighbor disable request for 192.168.100.4
Jun  9 06:05:46.662224 svcstr-7050-acs-4 ERR swss#orchagent: :- meta_generic_validation_remove: object 0x400000000081d reference count is 1, can't remove
Jun  9 06:05:46.662224 svcstr-7050-acs-4 ERR swss#orchagent: :- removeNeighbor: Failed to remove next hop 192.168.100.4 on Vlan1000, rv:-17
Jun  9 06:05:46.662282 svcstr-7050-acs-4 ERR swss#orchagent: :- handleSaiRemoveStatus: Encountered failure in remove operation, exiting orchagent, SAI API: SAI_API_NEXT_HOP, status: SAI_STATUS_OBJECT_IN_USE

w/fix:
Jun  9 06:03:08.374440 svcstr-7050-acs-3 NOTICE swss#orchagent: :- setState: [Ethernet12] Set MUX state from active to standby
Jun  9 06:03:08.374785 svcstr-7050-acs-3 INFO swss#orchagent: :- stateStandby: Set state to Standby for Ethernet12
Jun  9 06:03:08.375067 svcstr-7050-acs-3 NOTICE swss#orchagent: :- nbrHandler: Processing neighbors for mux Ethernet12, enable 0, state 2
Jun  9 06:03:08.375210 svcstr-7050-acs-3 INFO swss#orchagent: :- updateRoutes: Updating routes pointing to multiple mux nexthops
Jun  9 06:03:08.375553 svcstr-7050-acs-3 NOTICE swss#orchagent: :- updateRoute: Updating route 11.11.11.11/32 pointing to Mux nexthops 192.168.0.2@Vlan1000,192.168.100.4@Vlan1000
Jun  9 06:03:08.375553 svcstr-7050-acs-3 INFO swss#orchagent: :- updateRoute: No Active neighbors found, setting route 11.11.11.11 to point to tun
Jun  9 06:03:08.377446 svcstr-7050-acs-3 INFO swss#orchagent: :- disable: Disabling neigh 192.168.100.4 on Vlan1000
Jun  9 06:03:08.377446 svcstr-7050-acs-3 INFO swss#orchagent: :- updateNextHopRoutes: Route 11.11.11.11/32 is mux multi nexthop route, skipping.
Jun  9 06:03:08.382395 svcstr-7050-acs-3 INFO swss#orchagent: :- addTunnelRoute: Add tunnel route DB 'Vlan1000:192.168.100.4/32'
Jun  9 06:03:08.389626 svcstr-7050-acs-3 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.100.4/32
Jun  9 06:03:08.389626 svcstr-7050-acs-3 NOTICE swss#orchagent: :- disableNeighbor: Neighbor disable request for 192.168.100.4
Jun  9 06:03:08.392097 svcstr-7050-acs-3 NOTICE swss#orchagent: :- removeNeighbor: Removed next hop 192.168.100.4 on Vlan1000
Jun  9 06:03:08.394777 svcstr-7050-acs-3 INFO swss#orchagent: :- decreaseRouterIntfsRefCount: Router interface Vlan1000 ref count is decreased to 93
Jun  9 06:03:08.395137 svcstr-7050-acs-3 INFO swss#orchagent: :- decreaseRouterIntfsRefCount: Router interface Vlan1000 ref count is decreased to 92
Jun  9 06:03:08.395313 svcstr-7050-acs-3 NOTICE swss#orchagent: :- removeNeighbor: Removed neighbor aa:f4:6e:25:d3:03 on Vlan1000
Jun  9 06:03:08.395491 svcstr-7050-acs-3 INFO swss#orchagent: :- disable: Disabling neigh fc02:1000:100::4 on Vlan1000
Jun  9 06:03:08.395645 svcstr-7050-acs-3 INFO swss#orchagent: :- updateNextHopRoutes: No routes found for NH fc02:1000:100::4
Jun  9 06:03:08.395793 svcstr-7050-acs-3 INFO swss#orchagent: :- addTunnelRoute: Add tunnel route DB 'Vlan1000:fc02:1000:100::4/128'
Jun  9 06:03:08.400434 svcstr-7050-acs-3 NOTICE swss#orchagent: :- create_route: Created tunnel route to fc02:1000:100::4/128
Jun  9 06:03:08.400908 svcstr-7050-acs-3 NOTICE swss#orchagent: :- disableNeighbor: Neighbor disable request for fc02:1000:100::4
Jun  9 06:03:08.403559 svcstr-7050-acs-3 NOTICE swss#orchagent: :- removeNeighbor: Removed next hop fc02:1000:100::4 on Vlan1000
Jun  9 06:03:08.406318 svcstr-7050-acs-3 INFO swss#orchagent: :- decreaseRouterIntfsRefCount: Router interface Vlan1000 ref count is decreased to 91
Jun  9 06:03:08.406318 svcstr-7050-acs-3 INFO swss#orchagent: :- decreaseRouterIntfsRefCount: Router interface Vlan1000 ref count is decreased to 90
Jun  9 06:03:08.406318 svcstr-7050-acs-3 NOTICE swss#orchagent: :- removeNeighbor: Removed neighbor aa:f4:6e:25:d3:03 on Vlan1000
Jun  9 06:03:08.409735 svcstr-7050-acs-3 INFO swss#orchagent: :- createMuxAclTable: ACL table IngressTableDrop exists, reuse the same
Jun  9 06:03:08.409735 svcstr-7050-acs-3 NOTICE swss#orchagent: :- MuxAclHandler: Binding port 1000000000005
Jun  9 06:03:08.411635 svcstr-7050-acs-3 INFO swss#orchagent: :- setState: Changed state to standby
prsunny commented 2 months ago

@StormLiangMS for viz