sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
705 stars 1.35k forks source link

multi_asic_util:run_on_multi_asic opens too many redis connections to config_db #9448

Open nathcohe opened 2 years ago

nathcohe commented 2 years ago

Description

Multi ASIC utility wrapper method run_on_multi_asic() (from utilities_common/multi_asic.py creates too many connections to the redis config db and makes no attempt to close them. For chassis with many ASICs, this will cause some operations to fail, (e.g. portstat -a).

In method call multi_asic.get_all_namespaces() (from sonic_py_common/multi_asic.py) there is no attempt being made to close the connection after data is retrieved.

Even if we were to try to close it, SONiCV2Connector.close(ns) does not seem to properly close the connection either. The connection is only closed when the object is destroyed.

The 2 issues are:

Steps to reproduce the issue:

To show that SONiCV2Connector.close(ns) does not work on a multi asic device run the following:
  1. On RP copy the following script:
    
    #!/usr/bin/python3
    from swsscommon import swsscommon
    from utilities_common.general import load_db_config

namespace = "asic0"

if name == "main": print("Open DB Connecction test") load_db_config() config_db = swsscommon.ConfigDBConnector(namespace=namespace) input(f"Before connect to {namespace}, hit enter to continue...") # call 1 config_db.connect() input(f"Database opened for {namespace}, hit enter to continue...") # call 2 config_db.close(namespace) input(f"Config DB Closed for {namespace}, but object not destroyed, hit enter to continue") # call 3 print("config_db object has been destroyed")

2. Before running, open a `redis-cli` connection to the namespace of choice (`asic0` here).
3. Make the following call to the `redis-cli` in between each call to `input`

INFO Clients


#### Describe the results you received:
We will see that the number of `connected_clients` jumps by 1 between `input` calls 1 and 2.  It remains that way until __after call 3__.

#### Describe the results you expected:
We should be seeing a decrement of the `connected_clients` after _call 2_ before _call 3_.

#### Output of `show version`:

SONiC Software Version: SONiC.azure_cisco_master.310-dirty-20211203.003623 Distribution: Debian 10.11 Kernel: 4.19.0-12-2-amd64 Build commit: 218b6ef58 Build date: Fri Dec 3 09:29:34 UTC 2021 Built by: jenkins@mb-podb0-ch2b5

Platform: x86_64-8800_rp_o-r0 HwSKU: 8800-RP-O ASIC: cisco-8000 ASIC Count: 16 Serial Number: FOC2149NALF; FOC2149NALF Model Number: 8800-RP-O; 8800-RP-O Hardware Revision: 0.12; 0.12 Uptime: 14:53:59 up 2 days, 23:12, 2 users, load average: 1.31, 0.99, 0.99

Docker images: REPOSITORY TAG IMAGE ID SIZE docker-dhcp-relay latest d70ad26bee6e 435MB docker-gbsyncd-cisco azure_cisco_master.310-dirty-20211203.003623 e98f91ecd0b0 443MB docker-gbsyncd-cisco latest e98f91ecd0b0 443MB docker-syncd azure_cisco_master.310-dirty-20211203.003623 8ab77bc0901b 943MB docker-syncd latest 8ab77bc0901b 943MB docker-snmp azure_cisco_master.310-dirty-20211203.003623 22252d99c23d 464MB docker-snmp latest 22252d99c23d 464MB docker-teamd azure_cisco_master.310-dirty-20211203.003623 1ed080b40cb1 434MB docker-teamd latest 1ed080b40cb1 434MB docker-nat azure_cisco_master.310-dirty-20211203.003623 6a843803f3de 437MB docker-nat latest 6a843803f3de 437MB docker-sonic-mgmt-framework azure_cisco_master.310-dirty-20211203.003623 aa6da4bd0262 577MB docker-sonic-mgmt-framework latest aa6da4bd0262 577MB docker-platform-monitor azure_cisco_master.310-dirty-20211203.003623 9555f6f83916 687MB docker-platform-monitor latest 9555f6f83916 687MB docker-router-advertiser azure_cisco_master.310-dirty-20211203.003623 8ae6bda1d977 422MB docker-router-advertiser latest 8ae6bda1d977 422MB docker-lldp azure_cisco_master.310-dirty-20211203.003623 7c5e808ef5f5 462MB docker-lldp latest 7c5e808ef5f5 462MB docker-database azure_cisco_master.310-dirty-20211203.003623 2ae50f8d2097 422MB docker-database latest 2ae50f8d2097 422MB docker-orchagent azure_cisco_master.310-dirty-20211203.003623 c15fe35e3ff8 453MB docker-orchagent latest c15fe35e3ff8 453MB docker-macsec azure_cisco_master.310-dirty-20211203.003623 39b302d7f85e 437MB docker-macsec latest 39b302d7f85e 437MB docker-sonic-telemetry azure_cisco_master.310-dirty-20211203.003623 74a066aef073 510MB docker-sonic-telemetry latest 74a066aef073 510MB docker-mux azure_cisco_master.310-dirty-20211203.003623 89650cd4fa54 474MB docker-mux latest 89650cd4fa54 474MB docker-fpm-frr azure_cisco_master.310-dirty-20211203.003623 708fb31f440b 452MB docker-fpm-frr latest 708fb31f440b 452MB docker-sflow azure_cisco_master.310-dirty-20211203.003623 45f1ed1015d7 435MB docker-sflow latest 45f1ed1015d7 435MB



#### Additional information you deem important (e.g. issue happens only occasionally):

<!--
     Also attach debug file produced by `sudo generate_dump`
-->
anamehra commented 2 years ago

@abdosi This affects "show interface counter" on multi-asic sup

zhangyanzhao commented 2 years ago

Abhishek, please help to take a look.