sonic-net / sonic-swss

SONiC Switch State Service (SwSS)
https://azure.github.io/SONiC
Other
170 stars 512 forks source link

[COPP] COPP table entries are always deleted and recreated during warm-reboot #2217

Open maulik-marvell opened 2 years ago

maulik-marvell commented 2 years ago

Description

In sonic 202012, we are observing that COPP TABLE entries are removed from APP_DB by https://github.com/Azure/sonic-utilities/blob/2a982a1fe084b334fb99c372d171273931a0851b/scripts/db_migrator.py#L215 But, It is not removed from the ASIC db. This causes the discrepancy between current view and temporary view later during warm boot reconciliation. When COPP manager reads the /etc/sonic/copp_cfg.json, it recreates all these table entries again post warm-reboot. We are going through this unnecessary delete->create sequence which should have been avoided at least when we are migrating to the same release in warm-reboot.

Mar 16 15:38:29.800927 sonic-dut WARNING syncd#syncd: :- logViewObjectCount: object count for SAI_OBJECT_TYPE_HOSTIF on current view 33 is different than on temporary view: 32 Mar 16 15:38:29.800927 sonic-dut WARNING syncd#syncd: :- logViewObjectCount: object count for SAI_OBJECT_TYPE_HOSTIF_TRAP_GROUP on current view 6 is different than on temporary view: 1 Mar 16 15:38:29.800927 sonic-dut WARNING syncd#syncd: :- logViewObjectCount: object count for SAI_OBJECT_TYPE_POLICER on current view 4 is different than on temporary view: 0 Mar 16 15:38:29.801055 sonic-dut WARNING syncd#syncd: :- logViewObjectCount: object count for SAI_OBJECT_TYPE_FDB_ENTRY on current view 1 is different than on temporary view: 0 Mar 16 15:38:29.801055 sonic-dut WARNING syncd#syncd: :- logViewObjectCount: object count for SAI_OBJECT_TYPE_HOSTIF_TRAP on current view 13 is different than on temporary view: 1 Mar 16 15:38:29.801055 sonic-dut WARNING syncd#syncd: :- logViewObjectCount: object count for SAI_OBJECT_TYPE_HOSTIF_TABLE_ENTRY on current view 2 is different than on temporary view: 1 Mar 16 15:38:29.804960 sonic-dut WARNING syncd#syncd: :- logViewObjectCount: object count is different on both view, there will be ASIC OPERATIONS! … Mar 16 15:38:29.933816 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: remove: SAI_OBJECT_TYPE_POLICER:oid:0x120000000003e6 Mar 16 15:38:29.933816 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: remove: SAI_OBJECT_TYPE_HOSTIF_TRAP:oid:0x220000000003e9 Mar 16 15:38:29.933816 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: remove: SAI_OBJECT_TYPE_HOSTIF_TRAP:oid:0x220000000003f0 Mar 16 15:38:29.933816 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: remove: SAI_OBJECT_TYPE_HOSTIF_TRAP:oid:0x220000000003f1 Mar 16 15:38:29.933816 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: remove: SAI_OBJECT_TYPE_HOSTIF_TRAP:oid:0x220000000003f2 Mar 16 15:38:29.933816 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: remove: SAI_OBJECT_TYPE_HOSTIF_TRAP:oid:0x220000000003f5 Mar 16 15:38:29.933816 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: remove: SAI_OBJECT_TYPE_HOSTIF_TRAP:oid:0x220000000003f6 Mar 16 15:38:29.933816 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: remove: SAI_OBJECT_TYPE_HOSTIF_TRAP:oid:0x220000000003f7 Mar 16 15:38:29.933816 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: remove: SAI_OBJECT_TYPE_HOSTIF_TRAP:oid:0x220000000003f9 Mar 16 15:38:29.933816 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: remove: SAI_OBJECT_TYPE_HOSTIF_TRAP:oid:0x220000000003fa Mar 16 15:38:29.933816 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: remove: SAI_OBJECT_TYPE_HOSTIF_TRAP:oid:0x220000000003fb Mar 16 15:38:29.934023 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: remove: SAI_OBJECT_TYPE_HOSTIF_TRAP:oid:0x220000000003fc Mar 16 15:38:29.934023 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: remove: SAI_OBJECT_TYPE_HOSTIF_TABLE_ENTRY:oid:0x230000000003ee Mar 16 15:38:29.934023 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: remove: SAI_OBJECT_TYPE_HOSTIF:oid:0xd0000000003ec Mar 16 15:38:29.934023 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: remove: SAI_OBJECT_TYPE_HOSTIF_TRAP_GROUP:oid:0x110000000003e7 Mar 16 15:38:29.934023 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: remove: SAI_OBJECT_TYPE_HOSTIF_TRAP_GROUP:oid:0x110000000003ef Mar 16 15:38:29.934023 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: remove: SAI_OBJECT_TYPE_HOSTIF_TRAP_GROUP:oid:0x110000000003f3 Mar 16 15:38:29.934023 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: remove: SAI_OBJECT_TYPE_HOSTIF_TRAP_GROUP:oid:0x110000000003f8 Mar 16 15:38:29.934023 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: remove: SAI_OBJECT_TYPE_POLICER:oid:0x120000000003e8 Mar 16 15:38:29.934023 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: remove: SAI_OBJECT_TYPE_POLICER:oid:0x120000000003f4 Mar 16 15:38:29.934023 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: remove: SAI_OBJECT_TYPE_HOSTIF_TRAP:oid:0x220000000003ed Mar 16 15:38:29.934023 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: remove: SAI_OBJECT_TYPE_HOSTIF_TRAP_GROUP:oid:0x110000000003ea Mar 16 15:38:29.934023 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: remove: SAI_OBJECT_TYPE_POLICER:oid:0x120000000003eb Mar 16 15:38:29.934023 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: optimized operations! … Mar 16 15:38:31.744901 sonic-dut NOTICE swss#orchagent: :- setWarmStartState: orchagent warm start state changed to reconciled … Mar 16 15:38:31.779414 sonic-dut NOTICE swss#orchagent: :- processCoppRule: Set trap group default to host interface Mar 16 15:38:31.779414 sonic-dut WARNING swss#orchagent: :- trapGroupUpdatePolicer: Creating policer for existing Trap group: 11000000000122 (name:default). Mar 16 15:38:31.780481 sonic-dut NOTICE swss#orchagent: :- createPolicer: Create policer for trap group default Mar 16 15:38:31.819299 sonic-dut NOTICE swss#orchagent: :- createPolicer: Bind policer to trap group default: Mar 16 15:38:31.821924 sonic-dut NOTICE swss#orchagent: :- processCoppRule: Create host interface trap group queue1_group1 Mar 16 15:38:31.821924 sonic-dut WARNING swss#orchagent: :- trapGroupUpdatePolicer: Creating policer for existing Trap group: 1100000000057d (name:queue1_group1). Mar 16 15:38:31.824512 sonic-dut NOTICE swss#orchagent: :- createPolicer: Create policer for trap group queue1_group1 Mar 16 15:38:31.828835 sonic-dut NOTICE swss#orchagent: :- createPolicer: Bind policer to trap group queue1_group1: Mar 16 15:38:31.839782 sonic-dut NOTICE swss#orchagent: :- processCoppRule: Create host interface trap group queue2_group1 Mar 16 15:38:31.839782 sonic-dut WARNING swss#orchagent: :- trapGroupUpdatePolicer: Creating policer for existing Trap group: 11000000000580 (name:queue2_group1). Mar 16 15:38:31.841982 sonic-dut NOTICE swss#orchagent: :- createPolicer: Create policer for trap group queue2_group1 Mar 16 15:38:31.847839 sonic-dut NOTICE swss#orchagent: :- createPolicer: Bind policer to trap group queue2_group1:

Steps to reproduce the issue: Run sflow/test_sflow.py::TestReboot::testWarmreboot pytest on t0 topology with ‘--enable_sflow_feature’, we see these delete->create steps logged in syslog file Or

Run ‘warm-reboot’ command in DUT and observe the syslog

Describe the results you received: Ideally, there should not be any deletion->recreation in warm-reboot which causes the ASIC operations during warm boot reconciliation. All the COPP entries must be recovered after warm-reboot without the need of recreating them.

Additional information root@sonic-device1-dut:~# show version SONiC Software Version: SONiC.202012.Innovium.2.0.0.20220208.095204 Distribution: Debian 10.11 Kernel: 4.19.0-12-2-amd64 Build commit: 743561321 Build date: Tue Feb 8 20:58:58 UTC 2022 Built by: admin@sonic

prsunny commented 2 years ago

This behavior is as per the design choice as mentioned in the HLD . The expectation is to remove and reinstall the traps during warmboot. Traps can be different across image versions and it introduces unnecessary complexity if we have to implement the logic to identify the diff and apply the newly added traps.

maulik-marvell commented 2 years ago

This behavior is as per the design choice as mentioned in the HLD . The expectation is to remove and reinstall the traps during warmboot. Traps can be different across image versions and it introduces unnecessary complexity if we have to implement the logic to identify the diff and apply the newly added traps.

Hi,

Who is supposed to remove these traps? We are observing that it is not getting removed from the ASIC db when it is deleted from APPL DB by https://github.com/Azure/sonic-utilities/blob/2a982a1fe084b334fb99c372d171273931a0851b/scripts/db_migrator.py#L215 This causes the discrepancy between current view and temporary view later during warm boot reconciliation. Shouldn't it be removed from ASIC db as well so no discrepancy occurs?