sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
701 stars 1.35k forks source link

CPU Stall Issue on the DUT during Test Execution - Accton-AS7716-32X #17302

Open mithun2498 opened 7 months ago

mithun2498 commented 7 months ago

Description

I am facing the CPU Stall Issue on the DUT during test execution. The issue seen is frequent during test execution as a groups.

Describe the results you received:

dev-msn2700-01 login: admin Password: [46547.660832] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [46547.667453] rcu: 0-...0: (0 ticks this GP) idle=7fe/1/0x4000000000000000 softirq=824556/824556 fqs=3739010 [46610.680830] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [46610.687450] rcu: 0-...0: (0 ticks this GP) idle=7fe/1/0x4000000000000000 softirq=824556/824556 fqs=3745347 [46673.700829] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [46673.707448] rcu: 0-...0: (0 ticks this GP) idle=7fe/1/0x4000000000000000 softirq=824556/824556 fqs=3752305

Describe the results you expected:

Expecting to resolve the stall issue seen.

Output of show version:

root@dev-msn2700-01:~# show version

SONiC Software Version: SONiC.202305.366435-a49860cc7 SONiC OS Version: 11 Distribution: Debian 11.7 Kernel: 5.10.0-18-2-amd64 Build commit: a49860cc7 Build date: Tue Sep 19 14:00:34 UTC 2023 Built by: AzDevOps@vmss-soni0021EQ Platform: x86_64-accton_as7716_32x-r0 HwSKU: Accton-AS7716-32X ASIC: broadcom ASIC Count: 1 Serial Number: N/A Model Number: N/A Hardware Revision: N/A Uptime: 11:00:21 up 54 min, 1 user, load average: 1.88, 1.80, 1.88 Date: Mon 27 Nov 2023 11:00:21

Additional information you deem important (e.g. issue happens only occasionally):

CPLD Version: root@(none):/# cpldutil -ver CPLD#1 Version: 10 (0x0A) CPLD#2 Version: 14 (0x0E) CPLD#3 Version: 14 (0x0E) FAN CPLD Version: 08 (0x08) CPU CPLD Version: 12 (0x0C) root@dev-msn2700-01~# timed out wai.txt Console_Logs_DUT(7716).txt

prgeor commented 7 months ago

@mithun2498 is this Mellanox platform 2700 ? Please attach techsupport. Also, please check with Accton to triage first