sonic-net / sonic-mgmt

Configuration management examples for SONiC
Other
173 stars 688 forks source link

fix: support multi asic for config reload #13471

Closed cyw233 closed 5 days ago

cyw233 commented 6 days ago

Description of PR

Add multi asic support when wait_for_bgp is True in tests/common/config_reload.py

Summary: Fixes # (issue) Microsoft ADO 28459397

Type of change

Back port request

Approach

What is the motivation for this PR?

Currently, if wait_for_bgp is True here in tests/common/config_reload.py, a multi-asic Testbed might never meet the check_bgp_session_state check. This is due to the restrictions of the show ip bgp command. For example, we have the following output for the show ip bgp summary -d all command:

image

We can see there are multiple 3.3.3.3 neighbors in the output, and this is because 3.3.3.3 is the neighbor for each ASIC. However, when we try to get the bgp_neighbors here from the get_bgp_neighbors() function, we actually update the dictionary with all ASICs. Therefore, when we compare the length between neigh_ips and neigh_ok in check_bgp_session_state(), they will never be equal, because neigh_ips doesn't contain duplicates while neigh_ok contains duplicates, for a mutli-asic testbed.

How did you do it?

I updated the config_reload() function with multi asic support within the if wait_for_bgp: statement.

How did you verify/test it?

I manually put config_reload(..., wait_for_bgp=True) in a random test case and ran this test case in a multi-asic testbed (a T2 device). I can confirm this time config_reload(..., wait_for_bgp=True) passed.

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

yejianquan commented 5 days ago

Nice catch! @cyw233 Just wonder why we didn't meet this issue on 202305....

mssonicbld commented 5 days ago

Cherry-pick PR to 202405: https://github.com/sonic-net/sonic-mgmt/pull/13487

cyw233 commented 5 days ago

Nice catch! @cyw233 Just wonder why we didn't meet this issue on 202305....

Yeah...It's actually not easy to run into this problem because we have to run the tests with a multi-asic Testbed AND trigger the recovery process. I think we are very lucky that we didn't run into this in our 202305 T2 run.