sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
736 stars 1.42k forks source link

[BRCM Th3 Z9332]: When Ser is injected to a memory , the correction syslog event shows a different address #6392

Open chitra-raghavan opened 3 years ago

chitra-raghavan commented 3 years ago

Description

Platform : Z9332 ASIC : Th3

root@sonic-z9332-10429:~# show platform summary
Platform: x86_64-dellemc_z9332f_d1508-r0
HwSKU: DellEMC-Z9332f-O32
ASIC: broadcom
ASIC Count: 1
root@sonic-z9332-10429:~#

Script : https://github.com/Azure/sonic-mgmt/blob/master/tests/platform_tests/broadcom/test_ser.py When Ser is injected to a memory , the correction syslog event shows a different address. Hence script fails as the memory address are not similar.

Command :

  root@sonic-z9332-10429:~# bcmcmd "ser inject memory=L3_DEFIP_LEVEL1.ipipe0"
  ser inject memory=L3_DEFIP_LEVEL1.ipipe0
  Error injected on L3_DEFIP_LEVEL1.ipipe0 at index 0 pipe_x
  drivshell>

Memory address for L3_DEFIP_LEVEL1.ipipe0 : Memory corresponding to L3_DEFIP_LEVEL1.ipipe0 is 0x0e980000

root@sonic-z9332-10429:~# bcmcmd "list L3_DEFIP_LEVEL1.ipipe0"
  list L3_DEFIP_LEVEL1.ipipe0
  Memory: L3_DEFIP_LEVEL1.ipipe0 address 0x0e980000
  Flags: valid cbp cachable(on) bist-epic
  Blocks:  ipipe0 (1 copy)
  Entries: 1024 with indices 0-1023 (0x0-0x3ff), each 54 bytes 14 words
  Entry mask: -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0x0000ffff
  Description: L3_DEFIP_LEVEL1 table.

syslog: In syslog, it is shown as different address ( Address: 0x0e880400,)

Jan  8 11:01:42.168573 sonic-z9332-10429 INFO syncd#/supervisord: syncd 0:soc_tomahawk_process_ser_fifo: #015
Jan  8 11:01:42.168979 sonic-z9332-10429 INFO syncd#/supervisord: syncd Unit: 0 #015
Jan  8 11:01:42.169224 sonic-z9332-10429 INFO syncd#/supervisord: syncd 0:soc_tomahawk_process_ser_fifo: Multiple: 0:soc_tomahawk_process_ser_fifo: Mem: 0:soc_tomahawk_process_ser_fifo: Parity error..#015
Jan  8 11:01:42.169224 sonic-z9332-10429 INFO syncd#/supervisord: syncd 0:_soc_tomahawk_print_ser_fifo_details: Error in: transaction - refresh, aging etc.#015
Jan  8 11:01:42.169224 sonic-z9332-10429 INFO syncd#/supervisord: syncd 0:_soc_tomahawk_print_ser_fifo_details: Blk: 1, Pipe: 0, Address: 0x0e880400, base: 0xa2, stage: 3, index: 1024#015
Jan  8 11:01:42.169224 sonic-z9332-10429 INFO syncd#/supervisord: syncd 0:soc_ser_correction: SER_CORRECTION: reg/mem:9281 btype:17 sblk:1 at:-1 stage:3 addr:0x0e880000 port: 0 index: 1024#015
Jan  8 11:01:42.169224 sonic-z9332-10429 INFO syncd#/supervisord: syncd 0:_soc_ser_mem_correction: mem: 9281=L3_DEFIP_TCAM_LEVEL1 blkoffset:101#015
Jan  8 11:01:42.169224 sonic-z9332-10429 INFO syncd#/supervisord: syncd 0:_soc_ser_recovery_hw_cache: RESTORE pipe 0 [from pipe 1]: L3_DEFIP_TCAM_LEVEL1[9281] blk: ipipe0 index: 1024#015

Steps to reproduce the issue:

  1. Run the Ser test

Describe the results you received:

Describe the results you expected: When Ser is injected to a memory , the correction syslog event shows a the same address.

Additional information you deem important (e.g. issue happens only occasionally):

Output of show version:

root@sonic-z9332-10429:~# show ver

SONiC Software Version: SONiC.master.546-dirty-20210107.085958
Distribution: Debian 10.7
Kernel: 4.19.0-9-2-amd64
Build commit: f99dbff7
Build date: Thu Jan  7 09:08:07 UTC 2021
Built by: johnar@jenkins-worker-8

Platform: x86_64-dellemc_z9332f_d1508-r0
HwSKU: DellEMC-Z9332f-O32
ASIC: broadcom
ASIC Count: 1
Serial Number: TH04CN21CET009BR0023
Uptime: 11:40:46 up  3:03,  2 users,  load average: 1.44, 2.43, 2.74

Docker images:
REPOSITORY                    TAG                                IMAGE ID            SIZE
docker-syncd-brcm             latest                             df4e1d5abdb7        637MB
docker-syncd-brcm             master.546-dirty-20210107.085958   df4e1d5abdb7        637MB
docker-snmp                   latest                             29f6b69c23ea        431MB
docker-snmp                   master.546-dirty-20210107.085958   29f6b69c23ea        431MB
docker-teamd                  latest                             577d29033cae        450MB
docker-teamd                  master.546-dirty-20210107.085958   577d29033cae        450MB
docker-sonic-mgmt-framework   latest                             9544756347b3        604MB
docker-sonic-mgmt-framework   master.546-dirty-20210107.085958   9544756347b3        604MB
docker-router-advertiser      latest                             57004077bd40        390MB
docker-router-advertiser      master.546-dirty-20210107.085958   57004077bd40        390MB
docker-platform-monitor       latest                             e566055ebb0b        588MB
docker-platform-monitor       master.546-dirty-20210107.085958   e566055ebb0b        588MB
docker-lldp                   latest                             04a645e13c98        430MB
docker-lldp                   master.546-dirty-20210107.085958   04a645e13c98        430MB
docker-dhcp-relay             latest                             b2f634faf899        397MB
docker-dhcp-relay             master.546-dirty-20210107.085958   b2f634faf899        397MB
docker-database               latest                             ae7fcc4306e8        390MB
docker-database               master.546-dirty-20210107.085958   ae7fcc4306e8        390MB
docker-orchagent              latest                             ff1749455c0d        468MB
docker-orchagent              master.546-dirty-20210107.085958   ff1749455c0d        468MB
docker-nat                    latest                             9278ce50c3ce        453MB
docker-nat                    master.546-dirty-20210107.085958   9278ce50c3ce        453MB
docker-sonic-telemetry        latest                             357000a95471        465MB
docker-sonic-telemetry        master.546-dirty-20210107.085958   357000a95471        465MB
docker-fpm-frr                latest                             dd2e65dc4456        468MB
docker-fpm-frr                master.546-dirty-20210107.085958   dd2e65dc4456        468MB
docker-sflow                  latest                             7a938081e95e        451MB
docker-sflow                  master.546-dirty-20210107.085958   7a938081e95e        451MB

root@sonic-z9332-10429:~#

Attach debug file sudo generate_dump: unable to upload show tech support.

  root@sonic-z9332-10429:~# show techsupport --since 1day
  /usr/local/bin/generate_dump: line 879: OPTARG: unbound variable
  root@sonic-z9332-10429:~#
daall commented 3 years ago

@yxieca fyi

yxieca commented 3 years ago

This is a know issue from Broadcom SDK.

gechiang commented 3 years ago

@yxieca If this is a known issue with BRCM SDK should we pursuit the issue with BRCM to get a fix or this is not fixable by BRCM in which case we should go ahead make changes in testcase to not expect the exact address being reported in syslog?

gechiang commented 3 years ago

@chitra-raghavan I just checked the test run results on TH3 for test_ser and they are passing for us. Perhaps some recent changes already addressed the issue? Can you please try again and see if the issue still present? Thanks!

chitra-raghavan commented 3 years ago

@gechiang , the issue is still seen . These entries are mapped into "cannot pass" list . https://github.com/Azure/sonic-mgmt/blob/master/tests/platform_tests/broadcom/files/ser_injector.py#L274

root@sonic-10429:~# show ver

SONiC Software Version: SONiC.202012.27910-ada56abe6
Distribution: Debian 10.10
Kernel: 4.19.0-12-2-amd64
Build commit: ada56abe6
Build date: Sun Aug  8 16:53:21 UTC 2021
Built by: AzDevOps@sonic-build-workers-000KUH

Platform: x86_64-dellemc_z9332f_d1508-r0
HwSKU: DellEMC-Z9332f-M-O16C64
ASIC: broadcom
ASIC Count: 1

Sample entry1

root@sonic-10429:~# bcmcmd "ser inject memory=L2_ENTRY.ipipe0"
ser inject memory=L2_ENTRY.ipipe0
Error injected on L2_ENTRY.ipipe0 at index 0 pipe_x
drivshell>
root@sonic-10429:~#  bcmcmd "list L2_ENTRY.ipipe0" | grep list -A5
list L2_ENTRY.ipipe0
Memory: L2X.ipipe0 aka L2_ENTRY alias L2X address 0x0e6c0000
Flags: valid cachable(on) hashed multiview
Blocks:  ipipe0/dma/slam (1 copy, 1 dmaable, 1 slamable)
Entries: 8192 with indices 0-8191 (0x0-0x1fff), each 13 bytes 4 words
Entry mask: -1 -1 -1 0x00000007
root@sonic-10429:~#

syslog:

Aug 11 07:51:00.338319 sonic-10429 CRIT syncd#syncd: [none] SAI_API_SWITCH:_brcm_sai_switch_event_cb:498 5902592 Received switch event 2 on unit 0: 6 4 0
Aug 11 07:51:00.338364 sonic-10429 INFO syncd#/supervisord: syncd 0:soc_ser_correction:
Aug 11 07:51:00.338398 sonic-10429 CRIT syncd#syncd: [none] SAI_API_SWITCH:_brcm_sai_switch_event_cb:498 5902592 Received switch event 2 on unit 0: 2 4000001 e680000
Aug 11 07:51:00.338434 sonic-10429 INFO syncd#/supervisord: syncd SER_CORRECTION: reg/mem:9141 btype:17 sblk:1 at:-1 stage:3 addr:0x0e680000 port: 0 index: 0#015
Aug 11 07:51:00.338468 sonic-10429 INFO syncd#/supervisord: syncd 0:_soc_ser_mem_correction: mem: 9141=L2_ENTRY_ECC blkoffset:101#015
Aug 11 07:51:00.338504 sonic-10429 INFO syncd#/supervisord: syncd 0:_soc_ser_sram_correction: CLEAR_RESTORE: L2_ENTRY_ECC[9141] start_index: 0#015
Aug 11 07:51:00.338547 sonic-10429 INFO syncd#/supervisord: syncd 0:soc_tomahawk_process_ser_fifo: #015
Aug 11 07:51:00.338580 sonic-10429 INFO syncd#/supervisord: syncd Unit: 0 #015
Aug 11 07:51:00.338614 sonic-10429 INFO syncd#/supervisord: syncd 0:soc_tomahawk_process_ser_fifo: Mem: 0:soc_tomahawk_process_ser_fifo: Double or Multiple bit ECC error..#015
Aug 11 07:51:00.338676 sonic-10429 INFO syncd#/supervisord: syncd 0:_soc_tomahawk_print_ser_fifo_details: Error in: SBUS transaction.#015
Aug 11 07:51:00.338710 sonic-10429 INFO syncd#/supervisord: syncd 0:_soc_tomahawk_print_ser_fifo_details: Blk: 1, Pipe: 0, Address: 0x0e680000, base: 0x9a, stage: 3, index: 0#015
Aug 11 07:51:00.338743 sonic-10429 CRIT syncd#syncd: [none] SAI_API_SWITCH:_brcm_sai_switch_event_cb:498 5902592 Received switch event 2 on unit 0: 5 23a5 0
Aug 11 07:51:00.338776 sonic-10429 CRIT syncd#syncd: [none] SAI_API_SWITCH:_brcm_sai_switch_event_cb:498 5902592 Received switch event 2 on unit 0: 6 4 0
Aug 11 07:51:00.338808 sonic-10429 INFO syncd#/supervisord: syncd 0:soc_ser_correction: SER_CORRECTION: reg/mem:9141 btype:17 sblk:1 at:-1 stage:3 addr:0x0e680000 port: 0 index: 0#015
Aug 11 07:51:00.338841 sonic-10429 INFO syncd#/supervisord: syncd 0:_soc_ser_mem_correction: mem: 9141=L2_ENTRY_ECC blkoffset:101#015
Aug 11 07:51:00.338875 sonic-10429 INFO syncd#/supervisord: syncd 0:_soc_ser_sram_correction: CLEAR_RESTORE: L2_ENTRY_ECC[9141] start_index: 0#015

Sample entry2:

root@sonic-10429:~# bcmcmd "ser inject memory=L3_DEFIP_LEVEL1.ipipe0"
ser inject memory=L3_DEFIP_LEVEL1.ipipe0
Error injected on L3_DEFIP_LEVEL1.ipipe0 at index 0 pipe_x
drivshell>
root@sonic-10429:~# bcmcmd "list L3_DEFIP_LEVEL1.ipipe0"
list L3_DEFIP_LEVEL1.ipipe0
Memory: L3_DEFIP_LEVEL1.ipipe0 address 0x0e980000
Flags: valid cbp cachable(on) bist-epic
Blocks:  ipipe0 (1 copy)
Entries: 1024 with indices 0-1023 (0x0-0x3ff), each 54 bytes 14 words
Entry mask: -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0x0000ffff
Description: L3_DEFIP_LEVEL1 table.

syslog :

Aug 11 07:54:57.001863 sonic-10429 CRIT syncd#syncd: [none] SAI_API_SWITCH:_brcm_sai_switch_event_cb:498 5902592 Received switch event 2 on unit 0: 1 4000001 e880400
Aug 11 07:54:57.001863 sonic-10429 INFO syncd#/supervisord: syncd 0:soc_tomahawk_process_ser_fifo: #015
Aug 11 07:54:57.001863 sonic-10429 INFO syncd#/supervisord: syncd Unit: 0 #015
Aug 11 07:54:57.003166 sonic-10429 CRIT syncd#syncd: [none] SAI_API_SWITCH:_brcm_sai_switch_event_cb:498 5902592 Received switch event 2 on unit 0: 5 2424 400
Aug 11 07:54:57.003166 sonic-10429 CRIT syncd#syncd: [none] SAI_API_SWITCH:_brcm_sai_switch_event_cb:498 5902592 Received switch event 2 on unit 0: 6 5 0
Aug 11 07:54:57.003166 sonic-10429 INFO syncd#/supervisord: syncd 0:soc_tomahawk_process_ser_fifo: Multiple: 0:soc_tomahawk_process_ser_fifo: Mem: 0:soc_tomahawk_process_ser_fifo: Parity error..#015
Aug 11 07:54:57.003166 sonic-10429 INFO syncd#/supervisord: syncd 0:_soc_tomahawk_print_ser_fifo_details: Error in: transaction - refresh, aging etc.#015
Aug 11 07:54:57.003166 sonic-10429 INFO syncd#/supervisord: syncd 0:_soc_tomahawk_print_ser_fifo_details: Blk: 1, Pipe: 0, Address: 0x0e880400, base: 0xa2, stage: 3, index: 1024#015
Aug 11 07:54:57.003166 sonic-10429 INFO syncd#/supervisord: syncd 0:soc_ser_correction: SER_CORRECTION: reg/mem:9252 btype:17 sblk:1 at:-1 stage:3 addr:0x0e880000 port: 0 index: 1024#015
Aug 11 07:54:57.003166 sonic-10429 INFO syncd#/supervisord: syncd 0:_soc_ser_mem_correction: mem: 9252=L3_DEFIP_TCAM_LEVEL1 blkoffset:101#015
Aug 11 07:54:57.003166 sonic-10429 INFO syncd#/supervisord: syncd 0:_soc_ser_recovery_hw_cache: RESTORE pipe 0 [from pipe 1]: L3_DEFIP_TCAM_LEVEL1[9252] blk: ipipe0 index: 1024#015
gechiang commented 3 years ago

@chitra-raghavan Thanks for trying it again. Just spoke with @yxieca about this and he told me that there are some prep work required to clean up this test for TH3 platform. He will be sharing a document with me on how to do that and once this prep work is done we should be able to pass this test. I will update this case when that is complete. For now We will keep this issue at lower priority as this is indeed a known inefficiency issue with the BRCM platform.