Closed Blueve closed 3 years ago
Update with more information.
admin@sonic:~$ show version
SONiC Software Version: SONiC.master.443-bba5df05
Distribution: Debian 10.6
Kernel: 4.19.0-9-2-amd64
Build commit: bba5df05
Build date: Tue Oct 13 19:50:56 UTC 2020
Built by: johnar@jenkins-worker-11
Platform: x86_64-cel_e1031-r0
HwSKU: Celestica-E1031-T48S4
ASIC: broadcom
/usr/local/bin/decode-syseeprom : ERROR : Failed to read eeprom : [Errno 2] No such file or directory: '/sys/class/i2c-adapter/i2c-2/2-0050/eeprom'
Serial Number:
Uptime: 10:54:17 up 42 min, 1 user, load average: 1.43, 1.48, 1.46
Docker images:
REPOSITORY TAG IMAGE ID SIZE
docker-teamd latest bd1df6fe3cbc 394MB
docker-teamd master.443-bba5df05 bd1df6fe3cbc 394MB
docker-nat latest 2db062aaf7f7 396MB
docker-nat master.443-bba5df05 2db062aaf7f7 396MB
docker-router-advertiser latest 44f82f06d9e3 359MB
docker-router-advertiser master.443-bba5df05 44f82f06d9e3 359MB
docker-platform-monitor latest 4c51bd437497 441MB
docker-platform-monitor master.443-bba5df05 4c51bd437497 441MB
docker-lldp latest ea22f55f7cff 388MB
docker-lldp master.443-bba5df05 ea22f55f7cff 388MB
docker-database latest a87c7ce0bda1 359MB
docker-database master.443-bba5df05 a87c7ce0bda1 359MB
docker-orchagent latest 9ed19e1a8b00 407MB
docker-orchagent master.443-bba5df05 9ed19e1a8b00 407MB
docker-dhcp-relay latest 7d5aaaeacf8c 366MB
docker-dhcp-relay master.443-bba5df05 7d5aaaeacf8c 366MB
docker-sonic-telemetry latest a8c51245a513 429MB
docker-sonic-telemetry master.443-bba5df05 a8c51245a513 429MB
docker-sonic-mgmt-framework latest cab64d10b3ac 486MB
docker-sonic-mgmt-framework master.443-bba5df05 cab64d10b3ac 486MB
docker-sflow latest ee01daa4595c 397MB
docker-sflow master.443-bba5df05 ee01daa4595c 397MB
docker-fpm-frr latest a9df2f6c4db5 410MB
docker-fpm-frr master.443-bba5df05 a9df2f6c4db5 410MB
docker-snmp latest 2ffc7ed302ef 399MB
docker-snmp master.443-bba5df05 2ffc7ed302ef 399MB
docker-syncd-brcm latest b826fbfdc0c0 542MB
docker-syncd-brcm master.443-bba5df05 b826fbfdc0c0 542M
Hi @Blueve - please connect with Celestica team offline. cc: @yxieca
check out this message. it looks like a platform driver issue on celestica platform.
[ 25.291791] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[ 25.381489] BUG: unable to handle kernel paging request at ffffffffc073c098
[ 25.464931] PGD 6a60e067 P4D 6a60e067 PUD 6a610067 PMD 78b13067 PTE 8000000079dc0063
[ 25.557750] Oops: 0011 [#1] SMP PTI
[ 25.599529] CPU: 0 PID: 641 Comm: platform-module Tainted: G OE 4.19.0-9-2-amd64 #1 Debian 4.19.118-2+deb10u1
[ 25.732972] Hardware name: Celestica E1031/E1031, BIOS E1031010 06/25/2018
[ 25.815379] RIP: 0010:__this_module+0x58/0xffffffffffffcfc0 [pmbus_core]
[ 25.895692] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30 0c 4e 39 55 8d ff ff <98> b2 71 c0 ff ff ff ff d8 35 73 c0 ff ff ff ff b8 32 52 3b 55 8d
[ 26.120806] RSP: 0018:ffffb5ca00627b38 EFLAGS: 00010282
[ 26.183412] RAX: ffffffffc073c098 RBX: ffff8d5538af3400 RCX: 0000000000000000
[ 26.268934] RDX: 0000000000000003 RSI: 0000000000000000 RDI: ffff8d5538af3400
[ 26.298474] ismt_smbus 0000:00:13.0: completion wait timed out
[ 26.354454] RBP: 0000000000000000 R08: ffff8d553b4c0000 R09: 0000000000000000
[ 26.354456] R10: 0000000000100000 R11: 00000000000028ce R12: ffff8d5538af3818
[ 26.354458] R13: ffff8d5538af3400 R14: ffff8d5538af3818 R15: ffff8d5538af3420
[ 26.354460] FS: 00007fc2f7294740(0000) GS:ffff8d553c000000(0000) knlGS:0000000000000000
[ 26.354462] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 26.354464] CR2: ffffffffc073c098 CR3: 000000007933c000 CR4: 00000000001006f0
[ 26.354466] Call Trace:
[ 26.354474] ? _pmbus_write_byte.constprop.18+0x30/0x50 [pmbus_core]
[ 26.354481] ? pmbus_clear_faults+0x30/0x50 [pmbus_core]
[ 26.354488] ? pmbus_do_probe+0x21e/0xda0 [pmbus_core]
[ 26.424525] max6697: probe of 11-001a failed with error -110
Update From: Pradchaya Phucharoen (Celestica)
the issue occurs cause by out-of-tree DPS200 PSU driver. Because the driver uses an old copy of drivers/hwmon/pmbus/pmbus.h. When the kernel patched by this patch it causes the file to be different.
The best way to fix this is to merge the DPS200 into sonic-Linux-kernel instead of keeping it as a platform driver. The temporary fix is to update pmbus.h in the platform source codes with the one patched in sonic-linux-kernel.
Update I tested install the image after build 492 and I found that it can reboot in SONiC without hanging now. But still seen some issues might relating to this topic:
sonic-installer
command will hang. Force power-cycle will boot in previous image.
Description
I am trying to install a SONiC image (from recent master branch: https://sonic-jenkins.westus2.cloudapp.azure.com/job/broadcom/job/buildimage-brcm-all/436/) to my switch, but it hanging forever and not able to login.
I met this issue before, I power cycled the switch and rolled back to previous image to get my switch back. But I want to test my code with current master branch code, so I have to use new image now.
How to diagnostic this issue?
Steps to reproduce the issue:
sudo sonic-installer install http://10.1.100.70/images/jika/sonic-broadcom-pyw.bin
on switchDescribe the results you received:
The installation hanging forever and the last command line output is shown below:
Trying to connect to the switch via console, blank console in most time. Sometimes I got
But not interactable.
10/14/2020 update
I am trying to install a newer image: https://sonic-jenkins.westus2.cloudapp.azure.com/job/broadcom/job/buildimage-brcm-all/443/artifact/target/ It stuck again, but I caught few logs, it might helpful for trouble shooting?
Since it stuck too much time, I power-cycled the switch and boot in ONIE to install the same image. The log is shown below:
And then it stuck and no any response on console. @yxieca met same issue today. I power-cycled the switch again, and I can enter the OS now without issue.
Describe the results you expected:
SONiC OS installed with proper command line output feedback. Able to login to the SONiC OS after installed.
Additional information you deem important (e.g. issue happens only occasionally):