sonic-net / sonic-mgmt

Configuration management examples for SONiC
Other
201 stars 728 forks source link

test_snmp_queue fails due to unavailable queue counters #8934

Closed kartik-arista closed 1 year ago

kartik-arista commented 1 year ago

Description

snmp/test_snmp_queue.py fails deterministically due to missing 'queues' information in SNMP facts.

This was observed on a VOQ chassis platform but likely affects any non Cisco device.

Steps to reproduce the issue:

Describe the results you received:

run snmp/test_snmp_queue.py on any VOQ chassis device.

Describe the results you expected:

The tests should pass

Additional information you deem important:

Failed: port Ethernet39/1 does not have queue counters
duthosts = [<MultiAsicSonicHost str2-7804-lc5-1>, <MultiAsicSonicHost str2-7804-lc3-1>, <MultiAsicSonicHost str2-7804-lc7-1>, <MultiAsicSonicHost str2-7804-sup-1>]
enum_rand_one_per_hwsku_hostname = 'str2-7804-lc7-1'
localhost = <tests.common.devices.local.Localhost object at 0x7f87614397d0>
creds_all_duts = {'str2-7804-lc3-1': {'ad_domain': 'GME', 'ad_integration_enabled': True, 'ansible_become_pass': "{{ secret_group_vars[...vars['str']['ansible_become_pass'] }}", 'ansible_ssh_pas\
s': "{{ secret_group_vars['str']['ansible_ssh_pass'] }}", ...}}
collect_techsupport_all_duts = None

    def test_snmp_queues(duthosts, enum_rand_one_per_hwsku_hostname, localhost, creds_all_duts, collect_techsupport_all_duts):
        duthost = duthosts[enum_rand_one_per_hwsku_hostname]
        hostip = duthost.host.options['inventory_manager'].get_host(duthost.hostname).vars['ansible_host']

        snmp_facts = get_snmp_facts(localhost, host=hostip, version="v2c", community=creds_all_duts[duthost.hostname]["snmp_rocommunity"], wait=True)['ansible_facts']

        for k, v in snmp_facts['snmp_interfaces'].items():
            if "Ethernet" in v['description']:
                if not v.has_key('queues'):
>                   pytest.fail("port %s does not have queue counters" % v['name'])
E                   Failed: port Ethernet39/1 does not have queue counters

Investigating the code, it seems like this field comes from a private Cisco MIB (ansible/library/snmp_facts.py)

     # From Cisco private MIB (PFC and queue counters)
        self.cpfcIfRequests         = dp + "1.3.6.1.4.1.9.9.813.1.1.1.1" # + .ifindex
        self.cpfcIfIndications      = dp + "1.3.6.1.4.1.9.9.813.1.1.1.2" # + .ifindex
        self.requestsPerPriority    = dp + "1.3.6.1.4.1.9.9.813.1.2.1.2" # + .ifindex.prio
        self.indicationsPerPriority = dp + "1.3.6.1.4.1.9.9.813.1.2.1.3" # + .ifindex.prio
        self.csqIfQosGroupStats     = dp + "1.3.6.1.4.1.9.9.580.1.5.5.1.4" # + .ifindex.IfDirection.QueueID

which is then used to populate SNMP facts

   for varBinds in varTable:
        for oid, val in varBinds:
            current_oid = oid.prettyPrint()
            current_val = val.prettyPrint()
            if v.csqIfQosGroupStats in current_oid:
                ifIndex = int(current_oid.split('.')[-4])
                ifDirection = int(current_oid.split('.')[-3])
                queueId = int(current_oid.split('.')[-2])
                counterId = int(current_oid.split('.')[-1])
                results['snmp_interfaces'][ifIndex]['queues'][ifDirection][queueId][counterId] = current_val

which is not going to be present

kartik-arista commented 1 year ago

This is currently noticed in 202205. I have not confirmed the behavior in master.

Please note https://github.com/sonic-net/sonic-mgmt/pull/6744

which in turn is tied to  https://github.com/sonic-net/sonic-swss/pull/2432

These changes don't seem to be cherry picked into 202205, so the most pertinent short term question is whether all these changes need to be present in 202205 or not. If the answer is no, then there we need to decide whether it is appropriate to skip these tests unless the platform specifically supports the relevant MIB.

kartik-arista commented 1 year ago

@vmittal-msft @arlakshm for viz.

yxieca commented 1 year ago

@kartik-arista, https://github.com/sonic-net/sonic-swss/pull/2432 is contributed directly into 202205 branch. Are you saying this PR is the culprit or the fix?

Transferring this issue to sonic-buildimage repo as this issue seems to be an image issue.

kartik-arista commented 1 year ago

@yxieca Yes - originallly it was not clear if this issue was test related or product function related, and if the change required was to skip the relevant tests (or cherrypick additional functionality into 202205).

Also adding @arlakshm for visibility since he was following up with @prsunny and others who may be aware of the background.

rlhui commented 1 year ago

Discussed in chassis subgroup that test needs to be enhanced for multi-asic

kenneth-arista commented 1 year ago

Discussed in chassis subgroup that test needs to be enhanced for multi-asic

Here's a code reference to show that the test does not account for the namespace on multi-ASIC platforms: https://github.com/sonic-net/sonic-mgmt/blob/63f7512936391fea76f04109dadaab40066f9aec/tests/snmp/test_snmp_queue.py#L19

SuvarnaMeenakshi commented 1 year ago

https://github.com/sonic-net/sonic-mgmt/pull/9115 should fix this issue