sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
730 stars 1.4k forks source link

[DropCounters] Unable to configure Drop Counters on BRCM platforms #8538

Open gechiang opened 3 years ago

gechiang commented 3 years ago

Description

DropCounter configuration on all BRCM platform stopped working after the 07/31/21 202012 build.

admin@str2-7050cx3-acs-11:~$ sonic-db-cli STATE_DB keys "DEBUG_COUNTER_CAPABILITIES|*"

admin@str2-7050cx3-acs-11:~$ show dropcounters capabilities
Current device does not support drop counters
admin@str2-7050cx3-acs-11:~$ sudo config dropcounters install TEST PORT_INGRESS_DROPS L3_EGRESS_LINK_DOWN
Encountered error trying to install counter: Counter type not supported on this device
admin@str2-7050cx3-acs-11:~$

Steps to reproduce the issue:

  1. install 202012 image build on 7/31/21
  2. Issue 'sonic-db-cli STATE_DB keys "DEBUG_COUNTER_CAPABILITIES|*"' and you should see this is no longer present in state DB.
  3. Issue "show dropcounters capabilities" and you will see the new error message complaining about device does not support drop counters. You can also attempt to configure a drop counter and you will get the same error.

This is all because STATE DB is missing the DEBUG_COUNTER_CAPABILITIES for BRCM platforms.

Describe the results you expected:

On a working image (202012 image built on 07/30/21 or earlier), you see the following:

admin@str2-7050cx3-acs-11:~$ sonic-db-cli STATE_DB keys "DEBUG_COUNTER_CAPABILITIES|*"
DEBUG_COUNTER_CAPABILITIES|PORT_INGRESS_DROPS
admin@str2-7050cx3-acs-11:~$
admin@str2-7050cx3-acs-11:~$ show dropcounters capabilities
Counter Type          Total
------------------  -------
PORT_INGRESS_DROPS        2

PORT_INGRESS_DROPS
        IP_HEADER_ERROR
        FDB_AND_BLACKHOLE_DISCARDS
        SMAC_EQUALS_DMAC
        ACL_ANY
        SIP_LINK_LOCAL
        DIP_LINK_LOCAL
        L3_EGRESS_LINK_DOWN
        EXCEEDS_L3_MTU
admin@str2-7050cx3-acs-11:~$ 
admin@str2-7050cx3-acs-11:~$ sudo config dropcounters install TEST PORT_INGRESS_DROPS L3_EGRESS_LINK_DOWN
admin@str2-7050cx3-acs-11:~$

Output of show version:

admin@str2-7050cx3-acs-11:~$ show vers

SONiC Software Version: SONiC.202012.26516-cb49c6522
Distribution: Debian 10.10
Kernel: 4.19.0-12-2-amd64
Build commit: cb49c6522
Build date: Sat Jul 31 13:57:41 UTC 2021
Built by: AzDevOps@sonic-build-workers-000JY3

Platform: x86_64-arista_7050cx3_32s
HwSKU: Arista-7050CX3-32S-D48C8
ASIC: broadcom
ASIC Count: 1
Serial Number: JPE20432360
Uptime: 06:06:56 up 4 min,  1 user,  load average: 0.74, 0.97, 0.48

Docker images:
REPOSITORY                    TAG                      IMAGE ID            SIZE
docker-sonic-mgmt-framework   202012.26516-cb49c6522   d3bc9e35f298        621MB
docker-sonic-mgmt-framework   latest                   d3bc9e35f298        621MB
docker-sonic-telemetry        202012.26516-cb49c6522   18a9cf9ff6d0        491MB
docker-sonic-telemetry        latest                   18a9cf9ff6d0        491MB
docker-orchagent              202012.26516-cb49c6522   a9aac5a7c8b9        430MB
docker-orchagent              latest                   a9aac5a7c8b9        430MB
docker-fpm-frr                202012.26516-cb49c6522   fa906d900a45        430MB
docker-fpm-frr                latest                   fa906d900a45        430MB
docker-sflow                  202012.26516-cb49c6522   1253199aa124        413MB
docker-sflow                  latest                   1253199aa124        413MB
docker-teamd                  202012.26516-cb49c6522   85f33d243ae7        412MB
docker-teamd                  latest                   85f33d243ae7        412MB
docker-nat                    202012.26516-cb49c6522   1278a16a5736        415MB
docker-nat                    latest                   1278a16a5736        415MB
docker-platform-monitor       202012.26516-cb49c6522   228dea6653e2        609MB
docker-platform-monitor       latest                   228dea6653e2        609MB
docker-syncd-brcm             202012.26516-cb49c6522   fc4204eb5a48        694MB
docker-syncd-brcm             latest                   fc4204eb5a48        694MB
docker-snmp                   202012.26516-cb49c6522   fcad501d668a        443MB
docker-snmp                   latest                   fcad501d668a        443MB
docker-dhcp-relay             202012.26516-cb49c6522   bf6849987817        408MB
docker-dhcp-relay             latest                   bf6849987817        408MB
docker-router-advertiser      202012.26516-cb49c6522   c0238b7205bb        401MB
docker-router-advertiser      latest                   c0238b7205bb        401MB
docker-lldp                   202012.26516-cb49c6522   c1e305695ee0        441MB
docker-lldp                   latest                   c1e305695ee0        441MB
docker-database               202012.26516-cb49c6522   83ef64e74d95        401MB
docker-database               latest                   83ef64e74d95        401MB

admin@str2-7050cx3-acs-11:~$

Here are the changes that went into the build of 07/31/21:

https://dev.azure.com/mssonic/build/_traceability/runview/changes?currentRunId=26516
Which contains the following 3 changes:
https://github.com/Azure/sonic-buildimage/commit/ada56abe6efa17d3bac0c88661876b6da213b003
https://github.com/Azure/sonic-buildimage/commit/cb49c6522fca696326f3ebe70e876c097d52a7be
https://github.com/Azure/sonic-buildimage/commit/f1c8a6ab96d4188e650964aa820baf87dc5c5f17

Out of the 3 I believe the SWSS submodule changes in the [debugcounterorch] is most likely the culprit:

swss:
*[portsorch] fix errors when moving port from one lag to anoth… a67d8af
*[debugcounterorch] check if counter type is supported before querying… ( 04105a4
*Td2: Reclaim buffer from unused ports (#1830) ac7f5cf
*[Dynamic Buffer Calc][202012]Bug fix: Don't create lossless buffer pr… f54b7d0 

Additional information you deem important (e.g. issue happens only occasionally):

gechiang commented 3 years ago

Confirmed that the change from [debugcounterorch] check if counter type is supported before querying… ( 04105a4 is exposing the BRCM SAI issue where the count check was failing.

void DebugCounterOrch::publishDropCounterCapabilities()
{
    supported_ingress_drop_reasons = DropCounter::getSupportedDropReasons(SAI_DEBUG_COUNTER_ATTR_IN_DROP_REASON_LIST);
    supported_egress_drop_reasons  = DropCounter::getSupportedDropReasons(SAI_DEBUG_COUNTER_ATTR_OUT_DROP_REASON_LIST);
    supported_counter_types        = DropCounter::getSupportedCounterTypes();

    string ingress_drop_reason_str = DropCounter::serializeSupportedDropReasons(supported_ingress_drop_reasons);
    string egress_drop_reason_str = DropCounter::serializeSupportedDropReasons(supported_egress_drop_reasons);

    for (auto const &counter_type : DebugCounter::getDebugCounterTypeLookup())
    {
        string drop_reasons;

        if (!supported_counter_types.count(counter_type.first))  <== This check is causing StateDB not having the CAPABILITY for BRCM platforms
        {
            continue;
        }
        ...
gechiang commented 3 years ago

BRCM case CS00012205138 filed

prsunny commented 3 years ago

Revert commit on 202012 branch - https://github.com/Azure/sonic-swss/pull/1884. Issue is still present on master

zhangyanzhao commented 3 years ago

Discussed with BRCM and Genhwa will take a look.