sonic-net / sonic-mgmt

Configuration management examples for SONiC
Other
196 stars 717 forks source link

test_stress_acl.py fails on Broadcom platforms #7396

Closed xwjiang-ms closed 1 year ago

xwjiang-ms commented 1 year ago

Description In test_stress_acl.py, we will create an acl table, and continuously add/delete 100 acl rules. When running test_stress_acl.py on Broadcom platforms, will get sai error: ERR syncd#syncd: [none] SAI_API_ACL:brcm_sai_get_acl_counter_attribute:5682 Invalid acl table. However, acl table was successfully created at the beginning, the error message occurs while adding/deleting acl rules.

Steps to reproduce the issue:

  1. Run test_stress_acl.py, usually I could reproduce in debug level (--completeness_level debug)
  2. Will get error message in 4th/5th add/delete round

Describe the results you received: Will get sai error message: ERR syncd#syncd: [none] SAI_API_ACL:brcm_sai_get_acl_counter_attribute:5682 Invalid acl table. When doing further debug, take a round of test as example, I found that add: Successfully created ACL rule occurs 500 times, but remove: Successfully deleted ACL rule only occurs 400 times, and I found message like removeAclRule: ACL rule [ RULE_*] in table [STRESS_ACL] already deleted occurs 100 times. Seems that didn't delete correctly. I added show acl rule in bash file after every add/delete command, could show 100 rules after the last add, but bash file stopped in the last delete. Describe the results you expected: Successfully finish this testcase and no error messages.

Additional information you deem important:

**Output of `show version`:**

```
SONiC Software Version: SONiC.20220531.16
Distribution: Debian 11.6
Kernel: 5.10.0-18-2-amd64
Build commit: d8e7212f0a
Build date: Wed Jan 25 23:30:42 UTC 2023
Built by: cloudtest@3d24c432c000001

Platform: x86_64-cel_seastone-r0
HwSKU: Celestica-DX010-C32
ASIC: broadcom
ASIC Count: 1
Serial Number: DX010F2B118711MS100005
Model Number: R0872-F0009-01
Hardware Revision: N/A
Uptime: 02:29:50 up  2:33,  1 user,  load average: 1.52, 1.43, 1.27
Date: Mon 06 Feb 2023 02:29:50

Docker images:
REPOSITORY                 TAG           IMAGE ID       SIZE
docker-mux                 20220531.16   e321c6120053   532MB
docker-mux                 latest        e321c6120053   532MB
docker-macsec              20220531.16   1d87173e1288   501MB
docker-acms                20220531.16   cdb25bed3d48   530MB
docker-acms                latest        cdb25bed3d48   530MB
docker-orchagent           20220531.16   3f927e1824a1   518MB
docker-orchagent           latest        3f927e1824a1   518MB
docker-fpm-frr             20220531.16   92933aa28fee   529MB
docker-fpm-frr             latest        92933aa28fee   529MB
docker-teamd               20220531.16   caa21256fa26   499MB
docker-teamd               latest        caa21256fa26   499MB
docker-syncd-brcm          20220531.16   97e90bb5dcbd   825MB
docker-syncd-brcm          latest        97e90bb5dcbd   825MB
docker-gbsyncd-broncos     20220531.16   26473f47f4b7   530MB
docker-gbsyncd-broncos     latest        26473f47f4b7   530MB
docker-gbsyncd-credo       20220531.16   a9bab554f4fa   501MB
docker-gbsyncd-credo       latest        a9bab554f4fa   501MB
docker-dhcp-relay          20220531.16   acf0859e6af1   496MB
docker-snmp                20220531.16   8b3fdc9ce073   528MB
docker-snmp                latest        8b3fdc9ce073   528MB
docker-sonic-telemetry     20220531.16   fb81884399b8   564MB
docker-sonic-telemetry     latest        fb81884399b8   564MB
docker-router-advertiser   20220531.16   01d63554d2d3   483MB
docker-router-advertiser   latest        01d63554d2d3   483MB
docker-platform-monitor    20220531.16   968429d09262   608MB
docker-platform-monitor    latest        968429d09262   608MB
docker-lldp                20220531.16   2991721a5959   526MB
docker-lldp                latest        2991721a5959   526MB
docker-database            20220531.16   1e07236ca5ef   483MB
docker-database            latest        1e07236ca5ef   483MB
k8s.gcr.io/pause           3.5           ed210e3e4a5b   683kB
```

**Attach debug file `sudo generate_dump`:**

```
(paste your output here)
```

**Attach error log file:**

```
[error logs.txt](https://github.com/sonic-net/sonic-mgmt/files/10639744/error.logs.txt)
```
bingwang-ms commented 1 year ago

It looks like a timing issue. The counter thread is attempting to read a counter of a rule or table that is being deleted. Can you please check if the ACL is still working?

xwjiang-ms commented 1 year ago

Fixed in https://github.com/sonic-net/sonic-mgmt/pull/7549