sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
724 stars 1.38k forks source link

5232/PFC/201911/Orchagent crashes while testing PFC/PFCWD #5170

Open mini-nair-dell opened 4 years ago

mini-nair-dell commented 4 years ago

Description =========== While testing PFCWD orchagent crashes randomly. The exception is thrown in SAI when setting buffer pool attributes Unable to recreate at will. SAI : 3.7.5.1

Sequence of events seen in syslog that triggered the orchagent crash. ==========================================================================

Aug 3 09:54:08.709174 sonic ERR swss#orchagent: :- meta_generic_validation_set: SAI_BUFFER_POOL_ATTR_TYPE:SAI_ATTR_VALUE_TYPE_INT32 attr is create only and cannot be modified Aug 3 09:54:08.709174 sonic ERR swss#orchagent: :- processBufferPool: Failed to modify buffer pool, name:egress_pool, sai object:18000000000604, status:-5 Aug 3 09:54:08.709174 sonic ERR swss#orchagent: :- doTask: Failed to process buffer task, drop it Aug 3 09:54:08.709947 sonic ERR syncd#syncd: [none] brcm_sai_set_buffer_pool_attr:816 Unknown attribute 5 passed Aug 3 09:54:08.709947 sonic ERR syncd#syncd: :- processEvent: VID: oid:0x18000000000605 RID: oid:0x1800000001 Aug 3 09:54:08.710023 sonic ERR syncd#syncd: :- processEvent: attr: SAI_BUFFER_POOL_ATTR_XOFF_SIZE: 4194112 Aug 3 09:54:08.710144 sonic ERR syncd#syncd: :- processEvent: failed to execute api: set, key: SAI_OBJECT_TYPE_BUFFER_POOL:oid:0x18000000000605, status: SAI_STATUS_ATTR_NOT_IMPLEMENTED_0 Aug 3 09:54:08.710357 sonic ERR syncd#syncd: :- syncd_main: Runtime error: :- processEvent: failed to execute api: set, key: SAI_OBJECT_TYPE_BUFFER_POOL:oid:0x18000000000605, status: SAI_STATUS_ATTR_NOT_IMPLEMENTED_0 Aug 3 09:54:08.710357 sonic NOTICE syncd#syncd: :- notify_OA_about_syncd_exception: sending switch_shutdown_request notification to OA Aug 3 09:54:08.710759 sonic NOTICE syncd#syncd: :- notify_OA_about_syncd_exception: notification send successfull Aug 3 09:54:08.711027 sonic NOTICE swss#orchagent: :- handle_switch_shutdown_request: switch shutdown request Aug 3 09:54:08.711457 sonic ERR swss#orchagent: :- processBufferProfile: Unknown buffer profile field specified:mode, ignoring Aug 3 09:54:08.711457 sonic ERR swss#orchagent: :- meta_generic_validation_set: SAI_BUFFER_PROFILE_ATTR_THRESHOLD_MODE:SAI_ATTR_VALUE_TYPE_INT32 attr is create only and cannot be modified Aug 3 09:54:08.711457 sonic ERR swss#orchagent: :- processBufferProfile: Failed to modify buffer profile, name:egress_lossless_profile, sai object:19000000000606, status:-5 Aug 3 09:54:08.711457 sonic ERR swss#orchagent: :- doTask: Failed to process buffer task, drop it Aug 3 09:54:08.711457 sonic ERR swss#orchagent: :- processBufferProfile: Unknown buffer profile field specified:mode, ignoring Aug 3 09:54:08.711457 sonic ERR swss#orchagent: :- meta_generic_validation_set: SAI_BUFFER_PROFILE_ATTR_THRESHOLD_MODE:SAI_ATTR_VALUE_TYPE_INT32 attr is create only and cannot be modified Aug 3 09:54:08.711457 sonic ERR swss#orchagent: :- processBufferProfile: Failed to modify buffer profile, name:egress_lossy_profile, sai object:19000000000607, status:-5 Aug 3 09:54:08.711457 sonic ERR swss#orchagent: :- doTask: Failed to process buffer task, drop it Aug 3 09:54:08.711457 sonic ERR swss#orchagent: :- meta_generic_validation_set: SAI_BUFFER_PROFILE_ATTR_THRESHOLD_MODE:SAI_ATTR_VALUE_TYPE_INT32 attr is create only and cannot be modified Aug 3 09:54:08.711457 sonic ERR swss#orchagent: :- processBufferProfile: Failed to modify buffer profile, name:ingress_lossy_profile, sai object:19000000000608, status:-5 Aug 3 09:54:08.711457 sonic ERR swss#orchagent: :- doTask: Failed to process buffer task, drop it Aug 3 09:54:08.712462 sonic INFO swss#supervisord: orchagent terminate called after throwing an instance of 'std::invalid_argument' Aug 3 09:54:08.712462 sonic INFO swss#supervisord: orchagent what(): parse error - unexpected end of input Aug 3 09:54:16.606994 sonic INFO swss#supervisord 2020-08-03 09:54:08,918 INFO exited: orchagent (terminated by SIGABRT (core dumped); not expected)

SAI redis log: =================== sairedis.rec.6.gz:2020-08-03.09:54:08.708969|s|SAI_OBJECT_TYPE_BUFFER_POOL:oid:0x18000000000605|SAI_BUFFER_POOL_ATTR_XOFF_SIZE=4194112 sairedis.rec.6.gz:2020-08-03.09:54:08.710710|n|switch_shutdown_request|| sairedis.rec.6.gz:2020-08-03.09:54:08.711586|s|SAI_OBJECT_TYPE_INGRESS_PRIORITY_GROUP:oid:0x1a0000000001a8|SAI_INGRESS_PRIORITY_GROUP_ATTR_BUFFER_PROFILE=oid:0x19000000000608 sairedis.rec.6.gz:2020-08-03.09:54:08.712046|s|SAI_OBJECT_TYPE_QUEUE:oid:0x15000000000575|SAI_QUEUE_ATTR_BUFFER_PROFILE_ID=oid:0x19000000000607 sairedis.rec.6.gz:2020-08-03.09:54:08.712112|s|SAI_OBJECT_TYPE_QUEUE:oid:0x15000000000576|SAI_QUEUE_ATTR_BUFFER_PROFILE_ID=oid:0x19000000000607

Core trace: ============== Reading symbols from /usr/bin/orchagent...(no debugging symbols found)...done. [New LWP 63] [New LWP 62] [New LWP 43] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `/usr/bin/orchagent -d /var/log/swss -b 8192 -m 3c:2c:30:68:3d:00'. Program terminated with signal SIGABRT, Aborted.

0 0x00007f59cc45afff in raise () from /lib/x86_64-linux-gnu/libc.so.6

[Current thread is 1 (Thread 0x7f59cba0d700 (LWP 63))] (gdb) bt

0 0x00007f59cc45afff in raise () from /lib/x86_64-linux-gnu/libc.so.6

1 0x00007f59cc45c42a in abort () from /lib/x86_64-linux-gnu/libc.so.6

2 0x00007f59ccd730ad in __gnu_cxx::__verbose_terminate_handler() ()

from /usr/lib/x86_64-linux-gnu/libstdc++.so.6

3 0x00007f59ccd71066 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6

4 0x00007f59ccd710b1 in std::terminate() ()

from /usr/lib/x86_64-linux-gnu/libstdc++.so.6

5 0x00007f59ccd9be9e in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6

6 0x00007f59cdf334a4 in start_thread ()

from /lib/x86_64-linux-gnu/libpthread.so.0

7 0x00007f59cc510d0f in clone () from /lib/x86_64-linux-gnu/libc.so.6

(gdb)

root@sonic:~# show ver

SONiC Software Version: SONiC.HEAD.110-69584419 Distribution: Debian 9.12 Kernel: 4.9.0-11-2-amd64 Build commit: 69584419 Build date: Mon Jul 6 03:04:55 UTC 2020 Built by: johnar@jenkins-worker-7

Platform: x86_64-dellemc_s5232f_c3538-r0 HwSKU: DellEMC-S5232f-C32 ASIC: broadcom Serial Number: CN01WJVTCES0094Q0020 Uptime: 12:24:35 up 1 day, 13:18, 1 user, load average: 0.29, 0.24, 0.29

The syslogs are attached

Thanks Mini

mini-nair-dell commented 4 years ago

syslog.4.gz

xinliu-seattle commented 4 years ago

Please add the CLIs used for this test and procedure it went through.

mini-nair-dell commented 4 years ago

We have seen the issue while testing PFCWD as part of RDMA testing. The test plan was provided by Microsoft.

A snippet of the testplan attached below:

Test Case 6: PFC WD on Server Facing Ports Goal: To verify whether PFC WD functions as expected on server facing ports. Topology:

Figure 2 1 sender and 1 receiver Test Steps:

  1. Start bi-directional lossless traffic on priority 3 between Server 1 and Server 2
  2. Start bi-directional lossy traffic on remaining traffic classes
  3. Start pause storm from Server 1 port 1 on priority 3, and observe for a few minutes
  4. Stop pause storm from Server 1 port 1 on priority 3, and observe for a few minutes
  5. Repeat steps 3 – 4 for multiple times and observe
  6. Start pause storm from Server 2 port 2 on priority 3, and observe for a few minutes
  7. Stop pause storm from Server 2 port 2 on priority 3, and observe for a few minutes
  8. Repeat steps 6 – 7 for multiple times and observe
  9. Repeat steps 5 and 8
  10. Repeat steps 1 – 9 for priority 4 traffic and pause storm

The orchagent crashes randomly. We couldnt reproduce at will.

The complete test plan is also attached.

Thanks Mini

mini-nair-dell commented 4 years ago

MSFT RDMA Qualification Test Cases v1.4.docx