sonic-net / sonic-sairedis

SAI object interface to Redis database, as used in the SONiC project
Other
56 stars 273 forks source link

Duplicate zero buffer profiles in 'temp' view after Warmboot while storm is ongoing. #1030

Open svsivm opened 2 years ago

svsivm commented 2 years ago

Description: Original issue (https://github.com/Azure/sonic-sairedis/issues/899) addressed the problem in the comparison logic. However, the root cause seems to be in OA 'temp' view construction logic for which this issue is being raised.

Please refer to the original issue where @kcudnik has already done significant analysis.

Two extra zero buffer profiles are created in the 'temp' asic view if warmboot is executed while some queues have zero buffer profiles attached to them. The VIDs of these two extra zero buffer profiles in temp asic view match those in the 'current' asic view. However the attribute list in the temp asic view is empty for these matching VIDs and hence the comparison logic during warmboot reconciliation ends up 'creating' 2 new zero buffer profiles although these profiles already exist on the ASIC. We ran into this issue while running the PFC WD warmboot pytest, specifically the second sub-test (https://github.com/Azure/sonic-mgmt/blob/master/tests/pfcwd/test_pfcwd_warm_reboot.py#L25)

Please let us know why the 2 zero buffer profiles are created again post warmboot? Is it by design? These duplicate creates are causing problems subsequently in the testcase’s storm restoration path.

Steps to reproduce: Execute the second scenario in the pfc watchdog warmboot test on platform that uses 'zero buffer profile' model to handle PFC storms.

To reproduce manually, perform the following steps: (a) Enable PFC WD on all target port/queue. (b) Send PFC storm to target port/queue and verify PFC storm is detected and mitigation action is executed. (c) While PFC storm is continued to be sent, perform warmboot. (d) Compare the temp view and current asic view for BUFFER_PROFILE key and you can see that there are 2 extra buffer profiles in the temp view. (e) 2 zero buffer profiles are again 'created' by NOS.

Mar 30 02:47:42.655739 sonic-wistron3-dut WARNING syncd#syncd: :- logViewObjectCount: object count for SAI_OBJECT_TYPE_BUFFER_PROFILE on current view 8 is different than on temporary view: 10

Mar 30 02:47:42.766455 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: create: SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000005e6. <<<<<<< DUPLICATE ZERO BUFFER PROFILE Mar 30 02:47:42.766455 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: - SAI_BUFFER_PROFILE_ATTR_RESERVED_BUFFER_SIZE 0 Mar 30 02:47:42.766455 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: - SAI_BUFFER_PROFILE_ATTR_POOL_ID oid:0x1800000000050f Mar 30 02:47:42.766571 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: - SAI_BUFFER_PROFILE_ATTR_THRESHOLD_MODE SAI_BUFFER_PROFILE_THRESHOLD_MODE_DYNAMIC Mar 30 02:47:42.766571 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: - SAI_BUFFER_PROFILE_ATTR_SHARED_DYNAMIC_TH -8 Mar 30 02:47:42.766571 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: set: SAI_OBJECT_TYPE_QUEUE:oid:0x150000000001d5 Mar 30 02:47:42.766571 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: - SAI_QUEUE_ATTR_BUFFER_PROFILE_ID oid:0x190000000005e6 (current: oid:0x19000000000510) Mar 30 02:47:42.766571 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: set: SAI_OBJECT_TYPE_QUEUE:oid:0x150000000001e5 Mar 30 02:47:42.766571 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: - SAI_QUEUE_ATTR_BUFFER_PROFILE_ID oid:0x190000000005e6 (current: oid:0x19000000000510) Mar 30 02:47:42.766623 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: create: SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000005e8 <<<<<<< DUPLICATE ZERO BUFFER PROFILE Mar 30 02:47:42.766623 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: - SAI_BUFFER_PROFILE_ATTR_RESERVED_BUFFER_SIZE 0 Mar 30 02:47:42.766623 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: - SAI_BUFFER_PROFILE_ATTR_POOL_ID oid:0x18000000000511 Mar 30 02:47:42.766623 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: - SAI_BUFFER_PROFILE_ATTR_THRESHOLD_MODE SAI_BUFFER_PROFILE_THRESHOLD_MODE_DYNAMIC Mar 30 02:47:42.766661 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: - SAI_BUFFER_PROFILE_ATTR_SHARED_DYNAMIC_TH -8 Mar 30 02:47:42.766661 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: set: SAI_OBJECT_TYPE_INGRESS_PRIORITY_GROUP:oid:0x1a0000000000ae Mar 30 02:47:42.766661 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: - SAI_INGRESS_PRIORITY_GROUP_ATTR_BUFFER_PROFILE oid:0x190000000005e8 (current: oid:0x19000000000512) Mar 30 02:47:42.766661 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: set: SAI_OBJECT_TYPE_INGRESS_PRIORITY_GROUP:oid:0x1a0000000000be Mar 30 02:47:42.766694 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: - SAI_INGRESS_PRIORITY_GROUP_ATTR_BUFFER_PROFILE oid:0x190000000005e8 (current: oid:0x19000000000512) Mar 30 02:47:42.767998 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: create: SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000005e6 Mar 30 02:47:42.767998 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: set: SAI_OBJECT_TYPE_QUEUE:oid:0x150000000001d5 Mar 30 02:47:42.768036 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: set: SAI_OBJECT_TYPE_QUEUE:oid:0x150000000001e5 Mar 30 02:47:42.768036 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: create: SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000005e8 Mar 30 02:47:42.768036 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: set: SAI_OBJECT_TYPE_INGRESS_PRIORITY_GROUP:oid:0x1a0000000000ae Mar 30 02:47:42.768036 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: set: SAI_OBJECT_TYPE_INGRESS_PRIORITY_GROUP:oid:0x1a0000000000be

BEFORE WARMBOOT: root@sonic--dut:~# redis-cli -n 1 127.0.0.1:6379[1]> 127.0.0.1:6379[1]> 127.0.0.1:6379[1]> 127.0.0.1:6379[1]> keys BUFFER_PROFILE 1) "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000003c2" 2) "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000003c1" 3) "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x19000000000510" >>>>>>>> ZERO BUFFER PROFILE 4) "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000003bf" 5) "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000003be" 6) "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000003bd" 7) "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000003c0" 8) "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x19000000000512" >>>>>>>>> ZERO BUFFER PROFILE 127.0.0.1:6379[1]> hgetall ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x19000000000510 1) "SAI_BUFFER_PROFILE_ATTR_POOL_ID" 2) "oid:0x1800000000050f" 3) "SAI_BUFFER_PROFILE_ATTR_THRESHOLD_MODE" 4) "SAI_BUFFER_PROFILE_THRESHOLD_MODE_DYNAMIC" 5) "SAI_BUFFER_PROFILE_ATTR_RESERVED_BUFFER_SIZE" 6) "0" 7) "SAI_BUFFER_PROFILE_ATTR_SHARED_DYNAMIC_TH" 8) "-8" 127.0.0.1:6379[1]> hgetall ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x1900000000051 (empty array) 127.0.0.1:6379[1]> hgetall ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x19000000000512 1) "SAI_BUFFER_PROFILE_ATTR_POOL_ID" 2) "oid:0x18000000000511" 3) "SAI_BUFFER_PROFILE_ATTR_THRESHOLD_MODE" 4) "SAI_BUFFER_PROFILE_THRESHOLD_MODE_DYNAMIC" 5) "SAI_BUFFER_PROFILE_ATTR_RESERVED_BUFFER_SIZE" 6) "0" 7) "SAI_BUFFER_PROFILE_ATTR_SHARED_DYNAMIC_TH" 8) "-8"

AFTER WARMBOOT root@sonic--dut:~# redis-cli -n 1 127.0.0.1:6379[1]> keys BUFFER_PROFILE 1) "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000003c1" 2) "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000003c0" 3) "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000003bd" 4) "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000003c2" 5) "TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x1900000000053d" 6) "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000003bf" 7) "TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x1900000000053f" 8) "TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x1900000000053c" 9) "TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x19000000000510" >>>>>> Matching VID with current view, but empty attr list. 10) "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000003be" 11) "TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000005e8">>>>>> Extra ‘zero buffer profiles’ with appropriate attribute values. 12) "TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x1900000000053e" 13) "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x19000000000510" 14) "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x19000000000512" 15) "TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000005e6">>>>>> Extra ‘zero buffer profiles’ with appropriate attribute values. 16) "TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x1900000000053b" 17) "TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x19000000000512" >>>>>> Matching VID with current view, but empty attr list. 18) "TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x1900000000053a" 127.0.0.1:6379[1]> hgetall TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x19000000000510 1) "NULL" 2) "NULL" 127.0.0.1:6379[1]> hgetall TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x19000000000512 1) "NULL" 2) "NULL" 127.0.0.1:6379[1]> hgetall TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000005e8 1) "SAI_BUFFER_PROFILE_ATTR_POOL_ID" 2) "oid:0x180000000005e7" 3) "SAI_BUFFER_PROFILE_ATTR_THRESHOLD_MODE" 4) "SAI_BUFFER_PROFILE_THRESHOLD_MODE_DYNAMIC" 5) "SAI_BUFFER_PROFILE_ATTR_RESERVED_BUFFER_SIZE" 6) "0" 7) "SAI_BUFFER_PROFILE_ATTR_SHARED_DYNAMIC_TH" 8) "-8" 127.0.0.1:6379[1]> hgetall TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000005e6 1) "SAI_BUFFER_PROFILE_ATTR_POOL_ID" 2) "oid:0x180000000005e5" 3) "SAI_BUFFER_PROFILE_ATTR_THRESHOLD_MODE" 4) "SAI_BUFFER_PROFILE_THRESHOLD_MODE_DYNAMIC" 5) "SAI_BUFFER_PROFILE_ATTR_RESERVED_BUFFER_SIZE" 6) "0" 7) "SAI_BUFFER_PROFILE_ATTR_SHARED_DYNAMIC_TH" 8) "-8" pfcwd_warmboot.zip

svsivm commented 2 years ago

I have attached the syslog and sairedis log before and after warmboot in pfc_warmboot.zip which is attached with the previous post.

kcudnik commented 2 years ago

ok I know where the issue problem is Problem is when SAI_QUEUE_ATTR_BUFFER_PROFILE_ID is queried (GET operation) before APPLY_VIEW is issued The problem is located in the scopes of current view and temporary view, we had this issue before, imagine this situation: • On cold boot you create 1 one buffer profile (lets name it A), and set it to queue SAI_QUEUE_ATTR_BUFFER_PROFILE_ID • Then you are doing warm boot, and issue init view • Now you create 1 buffer profile (name it B) and set it on SAI_QUEUE_ATTR_BUFFER_PROFILE_ID, but this is build in temporary view, no asic operation is performed yet • Now you query SAI_QUEUE_ATTR_BUFFER_PROFILE_ID on existing queue, this operation returns buffer profile A (even do you assigned buffer profile B) since no apply view was issued • Now you issue apply_view command, and in temporary view you have 2 buffer buffer profiles (A and B) A because you queried it and that A OID was brought to temporary view and OA have knowledge of it (and it cannot be remove by syncd because it would violate OID consistency in OA), and B OID because you just created it in temporary view • This query happens twice, on SAI_QUEUE_ATTR_BUFFER_PROFILE_ID and SAI_INGRESS_PRIORITY_GROUP_ATTR_BUFFER_PROFILE so it brings back 2 oids, and hance 10 buffer profiles instead of 8. If you remove those 2 queries before APPLY_VIEW command then there would be no ASIC operations on buffer profile, and everything will work fine

It is not recommended to query attributes that you will eventually SET since it will lead to problems like this, and it can’t be easy solved, this needs to be addressed on OA logic