sonic-net / sonic-mgmt

Configuration management examples for SONiC
Other
194 stars 714 forks source link

test_bgp_queue failed because of failing to check uc0 counter #11109

Closed ysmanman closed 2 weeks ago

ysmanman commented 9 months ago

Description

We noticed the following failure in recent 202205 testing on T2:

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> traceback >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>                                                                                                                                                                                                                     

duthosts = [<MultiAsicSonicHost cmp227-4>, <MultiAsicSonicHost cmp227-5>, <MultiAsicSonicHost cmp227-6>, <MultiAsicSonicHost cmp227>]                                                                                                                                                                                                                                                      
enum_frontend_dut_hostname = 'cmp227-4', enum_asic_index = 0                                                                                                                                                                                                                                                                                                                               
tbinfo = {'comment': 'Tests Arista Arista-7804R3-FM', 'conf-name': 'ardut', 'duts': ['cmp227-4', 'cmp227-5', 'cmp227-6', 'cmp227'], 'duts_map': {'cmp227': 3, 'cmp227-4': 0, 'cmp227-5': 1, 'cmp227-6': 2}, ...}                                                                                                                                                                           

    def test_bgp_queues(duthosts, enum_frontend_dut_hostname, enum_asic_index, tbinfo):                                                                                                                                                                                                                                                                                                    
        duthost = duthosts[enum_frontend_dut_hostname]                                                                                                                                                                                                                                                                                                                                     
        asichost = duthost.asic_instance(enum_asic_index)                                                                                                                                                                                                                                                                                                                                  
        clear_queue_counters(asichost)                                                                                                                                                                                                                                                                                                                                                     
        time.sleep(10)                                                                                                                                                                                                                                                                                                                                                                     
        bgp_facts = duthost.bgp_facts(instance_id=enum_asic_index)['ansible_facts']                                                                                                                                                                                                                                                                                                        
        mg_facts = asichost.get_extended_minigraph_facts(tbinfo)                                                                                                                                                                                                                                                                                                                           

        arp_dict = {}                                                                                                                                                                                                                                                                                                                                                                      
        ndp_dict = {}                                                                                                                                                                                                                                                                                                                                                                      
        processed_intfs = set()                                                                                                                                                                                                                                                                                                                                                            
        show_arp = asichost.command('show arp')                                                                                                                                                                                                                                                                            
        show_ndp = asichost.command('show ndp')                                                                                                                                                                                                                                                                            
        for arp_entry in show_arp['stdout_lines']:                                                                                                                                                                                                                                                                         
            items = arp_entry.split()                                                                                                                                                                                                                                                                                      
            if (len(items) != 4):                                                                                                                                                                                                                                                                                          
                continue                                                                                                                                                                                                                                                                                                   
            ip = items[0]                                                                                                                                                                                                                                                                                                  
            iface = items[2]                                                                                                                                                                                                                                                                                               
            arp_dict[ip] = iface                                                                                                                                                                                                                                                                                           
        for ndp_entry in show_ndp['stdout_lines']:                                                                                                                                                                                                                                                                         
            items = ndp_entry.split()                                                                                                                                                                                                                                                                                      
            if (len(items) != 5):                                                                                                                                                                                                                                                                                          
                continue                                                                                                                                                                                                                                                                                                   
            ip = items[0]                                                                                                                                                                                                                                                                                                  
            iface = items[2]                                                                                                                                                                                                                                                                                               
            ndp_dict[ip] = iface         

        for k, v in list(bgp_facts['bgp_neighbors'].items()):                                      
            # Only consider established bgp sessions                                               
            if v['state'] == 'established':                                                        
                assert (k in arp_dict.keys() or k in ndp_dict.keys())                         
                if k in arp_dict:                                                             
                    ifname = arp_dict[k].split('.', 1)[0]                                     
                else:                                                                         
                    ifname = ndp_dict[k].split('.', 1)[0]                                     
                if ifname in processed_intfs:                                                 
                    continue                                                                  
                if (ifname.startswith("PortChannel")):                                        
                    for port in mg_facts['minigraph_portchannels'][ifname]['members']:                                                                                                       
                        logger.info("PortChannel '{}' : port {}".format(ifname, port))                                                                                                       
                        for q in range(0, 7):                                                 
>                           assert(get_queue_counters(asichost, port, q) == 0)                
E                           AssertionError                                                    

arp_dict   = {'-----------': '---------------', '10.0.0.1': 'PortChannel102', '10.0.0.101': 'Ethernet-IB0', '10.0.0.103': 'Ethernet-IB0', ...}                                               
arp_entry  = u'Total number of entries 77 '                                                   
asichost   = <SonicAsic 0>                                                                    
bgp_facts  = {'bgp_neighbors': {'10.0.0.1': {'accepted prefixes': 9, 'admin': u'up', 'capabilities': {'peer restart timer': 300}, '...ections dropped': 0, ...}, ...}, 'bgp_statistics': {'ipv4': 13, 'ipv4_admin_down': 0, 'ipv4_idle': 0, 'ipv6': 13, ...}}
duthost    = <MultiAsicSonicHost cmp227-4>                                                    
duthosts   = [<MultiAsicSonicHost cmp227-4>, <MultiAsicSonicHost cmp227-5>, <MultiAsicSonicHost cmp227-6>, <MultiAsicSonicHost cmp227>]                                                      
enum_asic_index = 0                                                                           
enum_frontend_dut_hostname = 'cmp227-4'                                                       
iface      = 'of'                                                                             
ifname     = 'PortChannel108'                                                                 
ip         = 'Total'                                                                          
items      = ['Total', 'number', 'of', 'entries', '95']                                       
k          = '10.0.0.13'                                                                      
mg_facts   = {'deployment_id': None, 'dhcp_servers': [], 'dhcpv6_servers': [], 'forced_mgmt_routes': [], ...}                                                                                
ndp_dict   = {'-------------------------': '---------------', 'Address': 'Iface', 'Total': 'of', 'fc00:3000::3': 'Ethernet-IB0', ...}                                                        
ndp_entry  = u'Total number of entries 95 '                                                   
port       = u'Ethernet56'                                                                    
processed_intfs = set(['PortChannel1010', 'PortChannel106'])                                  
q          = 0                                                                                
show_arp   = {'stderr_lines': [], u'cmd': [u'sudo', u'ip', u'netns', u'exec', u'asic0', u's...'become_method', and 'become_user' rather than running sudo"], 'failed': False}                                                                                                                                                                                                              
show_ndp   = {'stderr_lines': [], u'cmd': [u'sudo', u'ip', u'netns', u'exec', u'asic0', u's...'become_method', and 'become_user' rather than running sudo"], 'failed': False}                                                                                                                                                                                                              
tbinfo     = {'comment': 'Tests Arista Arista-7804R3-FM', 'conf-name': 'ardut', 'duts': ['cmp227-4', 'cmp227-5', 'cmp227-6', 'cmp227'], 'duts_map': {'cmp227': 3, 'cmp227-4': 0, 'cmp227-5': 1, 'cmp227-6': 2}, ...}
v          = {'accepted prefixes': 9, 'admin': u'up', 'capabilities': {'peer restart timer': 300}, 'connections dropped': 0, ...}                                                            

The failure was because uc0 queue counter of Ethernet56, member of PortChannel108, was incremented unexpected.

Steps to reproduce the issue: 1. 2. 3.

Describe the results you received:

Describe the results you expected:

Additional information you deem important:

**Output of `show version`:**

```
(paste your output here)
```

**Attach debug file `sudo generate_dump`:**

```
(paste your output here)
```
ysmanman commented 9 months ago

Add @arlakshm @kenneth-arista for visibility

ysmanman commented 9 months ago

The unexpected uc0 pkts egressing out from Ethernet56 were rsyslog packets:

10:20:32.333208 IP (tos 0x0, ttl 63, id 7669, offset 0, flags [DF], proto UDP (17), length 113)                                                                       
    10.1.0.3.57625 > 10.0.0.6.syslog: [udp sum ok] [|syslog]
        0x0000:  06a4 09c9 230e 948e d35e 8504 0800 4500                                                                                                              
        0x0010:  0071 1df5 4000 3f11 097e 0a01 0003 0a00
        0x0020:  0006 e119 0202 005d 3691 4465 6320 3230
        0x0030:  2031 383a 3230 3a33 322e 3333 3632 3439
        0x0040:  2063 6d70 3232 372d 3620 494e 464f 2073                                                                                                              
        0x0050:  7973 7465 6d64 5b31 5d3a 2073 7973 7374
        0x0060:  6174 2d63 6f6c 6c65 6374 2e73 6572 7669
        0x0070:  6365 3a20 5375 6363 6565 6465 642e 0a                                                                                                                
10:20:32.333990 IP (tos 0x0, ttl 63, id 7670, offset 0, flags [DF], proto UDP (17), length 119)                                                                       
    10.1.0.3.57625 > 10.0.0.6.syslog: [udp sum ok] [|syslog]                                                                                                          
        0x0000:  06a4 09c9 230e 948e d35e 8504 0800 4500
        0x0010:  0077 1df6 4000 3f11 0977 0a01 0003 0a00
        0x0020:  0006 e119 0202 0063 ab9f 4465 6320 3230
        0x0030:  2031 383a 3230 3a33 322e 3333 3731 3131
        0x0040:  2063 6d70 3232 372d 3620 494e 464f 2073
        0x0050:  7973 7465 6d64 5b31 5d3a 2046 696e 6973
        0x0060:  6865 6420 7379 7374 656d 2061 6374 6976
        0x0070:  6974 7920 6163 636f 756e 7469 6e67 2074                                                                                                              
        0x0080:  6f6f 6c2e 0a                                                                                                                                         
10:20:32.406469 IP (tos 0x0, ttl 63, id 7677, offset 0, flags [DF], proto UDP (17), length 107)                                                                       
    10.1.0.3.57625 > 10.0.0.6.syslog: [udp sum ok] [|syslog]                                                                                                          
        0x0000:  06a4 09c9 230e 948e d35e 8504 0800 4500                                                                                                              
        0x0010:  006b 1dfd 4000 3f11 097c 0a01 0003 0a00                                                                                                              
        0x0020:  0006 e119 0202 0057 2a0a 4465 6320 3230                                                                                                              
        0x0030:  2031 383a 3230 3a33 322e 3338 3035 3733
        0x0040:  2063 6d70 3232 372d 3620 494e 464f 2073                                                                                                              
        0x0050:  7973 7465 6d64 5b31 5d3a 206c 6f67 726f                                                                                                              
        0x0060:  7461 7465 2e73 6572 7669 6365 3a20 5375                                                                                                              
        0x0070:  6363 6565 6465 642e 0a
10:20:32.406518 IP (tos 0x0, ttl 63, id 7678, offset 0, flags [DF], proto UDP (17), length 104)                                                                       
    10.1.0.3.57625 > 10.0.0.6.syslog: [udp sum ok] [|syslog]                                                                                                          
        0x0000:  06a4 09c9 230e 948e d35e 8504 0800 4500                                                                                                              
        0x0010:  0068 1dfe 4000 3f11 097e 0a01 0003 0a00
        0x0020:  0006 e119 0202 0054 02bd 4465 6320 3230                                                                                                              
        0x0030:  2031 383a 3230 3a33 322e 3338 3133 3731                                                                                                              
        0x0040:  2063 6d70 3232 372d 3620 494e 464f 2073
        0x0050:  7973 7465 6d64 5b31 5d3a 2046 696e 6973                                                                                                              
        0x0060:  6865 6420 526f 7461 7465 206c 6f67 2066                                                                                                              
        0x0070:  696c 6573 2e0a                                       

rsyslog server is configured by default in sonic-mgmt. These packets can fail the test if they were sent when test was running and checking tx counter.

ysmanman commented 2 weeks ago

We didn't see the issue anymore after removing rsyslog server config.