sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
741 stars 1.44k forks source link

[warm-reboot] Orchagent will crash during startup, when we execute warm-reboot after pfcwd detects storm #2888

Open leoli-nps opened 5 years ago

leoli-nps commented 5 years ago

Description

  1. Top
    (SW1)Ethernet120 ---- Ixia
  2. Enable pfc on queue 0 of Ethernet120
    "PORT_QOS_MAP": {
        "Ethernet120": {
            "pfc_to_queue_map": "[MAP_PFC_PRIORITY_TO_QUEUE|AZURE]",
            "pfc_enable": "0"
        }
    }
  3. Configuring pfcwd on Ethernet120
    "PFC_WD_TABLE": {
        "Ethernet120": {
            "action": "drop",
            "detection_time": "500",
            "restoration_time": "5000"
        }
    }
  4. Sending pfc frames to Ethernet120 via Ixia, triggering pfc storm; then performing warm-reboot operation on SW1. After this, we find that orchagent crashed. And in the log we can see the following messages:
    
    May 10 11:01:20.856475 sonic NOTICE swss#orchagent: :- syncd_apply_view: Notify syncd APPLY_VIEW
    May 10 11:01:20.856490 sonic NOTICE swss#orchagent: :- sai_redis_notify_syncd: sending syncd APPLY view
    May 10 11:01:20.857630 sonic NOTICE swss#orchagent: :- sai_redis_internal_notify_syncd: wait for notify response
    May 10 11:01:20.859381 sonic WARNING syncd#syncd: :- notifySyncd: syncd received APPLY VIEW, will translate
    May 10 11:01:20.884649 sonic NOTICE syncd#syncd: :- dump: getting took 0.018019 sec
    May 10 11:01:20.885501 sonic ERR syncd#syncd: :- sai_deserialize_attr_id: invalid attr id: SAI_INGRESS_PRIORITY_GROUP_STAT_DROPPED_PACKETS
    May 10 11:01:20.886200 sonic NOTICE syncd#syncd: :- redisGetAsicView: get asic view from ASIC_STATE took 0.020048 sec
    May 10 11:01:20.886316 sonic ERR syncd#syncd: :- syncdApplyView: Exception: :- sai_deserialize_attr_id: invalid attr id: SAI_INGRESS_PRIORITY_GROUP_STAT_DROPPED_PACKETS
    May 10 11:01:20.886843 sonic NOTICE syncd#syncd: :- syncdApplyView: apply took 0.027272 sec
    May 10 11:01:20.886843 sonic NOTICE syncd#syncd: :- sendNotifyResponse: sending response: SAI_STATUS_FAILURE
    May 10 11:01:20.887118 sonic NOTICE swss#orchagent: :- sai_redis_internal_notify_syncd: notify response: SAI_STATUS_FAILURE
    May 10 11:01:20.887118 sonic ERR swss#orchagent: :- sai_redis_notify_syncd: notify syncd failed: SAI_STATUS_FAILURE
    May 10 11:01:20.887129 sonic ERR swss#orchagent: :- syncd_apply_view: Failed to notify syncd APPLY_VIEW -1
And find the following information in ASIC_DB:

admin@sonic:~$ redis-cli -n 2 hget "COUNTERS_PG_NAME_MAP" "Ethernet120:0" "oid:0x1a000000000552" admin@sonic:~$ redis-cli -n 1 hgetall "ASIC_STATE:SAI_OBJECT_TYPE_INGRESS_PRIORITY_GROUP:oid:0x1a000000000552" 1) "NULL" 2) "NULL" 3) "SAI_INGRESS_PRIORITY_GROUP_ATTR_BUFFER_PROFILE" 4) "oid:0x19000000000665" 5) "SAI_INGRESS_PRIORITY_GROUP_STAT_PACKETS" 6) "0" 7) "SAI_INGRESS_PRIORITY_GROUP_STAT_DROPPED_PACKETS" 8) "0" admin@sonic:~$

<!--
Briefly describe the problem you are having in a few paragraphs.
-->

**Steps to reproduce the issue:**
1. As described in the **Description**
2.
3.

**Describe the results you received:**
As described in the **Description**

**Describe the results you expected:**
Warm-reboot can start normally

**Additional information you deem important (e.g. issue happens only occasionally):**

    **Output of `show version`:**

admin@sonic:~$ show version SONiC Software Version: SONiC.origin_201811.0-dirty-20190418.223441 Distribution: Debian 9.8 Kernel: 4.9.0-8-amd64 Build commit: 051bb23 Build date: Fri Apr 19 06:33:08 UTC 2019 Built by: simon@nps65

Docker images: REPOSITORY TAG IMAGE ID SIZE docker-syncd-nephos latest 1c3500846360 326MB docker-syncd-nephos origin_201811.0-dirty-20190418.223441 1c3500846360 326MB docker-orchagent-nephos latest f9c367fb5fc5 368MB docker-orchagent-nephos origin_201811.0-dirty-20190418.223441 f9c367fb5fc5 368MB docker-teamd latest 8a6898e1dfa7 353MB docker-teamd origin_201811.0-dirty-20190418.223441 8a6898e1dfa7 353MB docker-fpm-quagga latest de4a2a321623 372MB docker-fpm-quagga origin_201811.0-dirty-20190418.223441 de4a2a321623 372MB docker-lldp-sv2 latest 7c53844507f0 294MB docker-lldp-sv2 origin_201811.0-dirty-20190418.223441 7c53844507f0 294MB docker-dhcp-relay latest 903f08df67cf 258MB docker-dhcp-relay origin_201811.0-dirty-20190418.223441 903f08df67cf 258MB docker-database latest 2b048aa0fe97 255MB docker-database origin_201811.0-dirty-20190418.223441 2b048aa0fe97 255MB docker-snmp-sv2 latest b42a83fc56f8 330MB docker-snmp-sv2 origin_201811.0-dirty-20190418.223441 b42a83fc56f8 330MB docker-router-advertiser latest b6b8150e559a 254MB docker-router-advertiser origin_201811.0-dirty-20190418.223441 b6b8150e559a 254MB docker-platform-monitor latest f8442c4d55a8 297MB docker-platform-monitor origin_201811.0-dirty-20190418.223441 f8442c4d55a8 297MB

admin@sonic:~$



    **Attach debug file `sudo generate_dump`:**
[sonic_dump_sonic_20190510_110308.tar.gz](https://github.com/Azure/sonic-buildimage/files/3171742/sonic_dump_sonic_20190510_110308.tar.gz)

Signed-off-by: leo.li leo.li@nephosinc.com
yxieca commented 5 years ago

@wendani can you take a look and see if this issue has been addressed?