When doing a warmboot upgrade from 202012 branch to master branch, it sometimes fails. Specifically, it appears that the master branch kernel fails to initialize and switch to userspace.
Some debugging showed that at least sometimes, there is a kernel panic that may be happening:
[ 6.214792] Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler
[ 6.214792] Shutting down cpus with NMI
The above message was seen either immediately after kexec or when the login prompt appears on the console (showing that, in those cases, it did switch to userspace, but something still caused it to panic.
What's odder is that a lab device that used to exhibit this issue consistently now doesn't appear to be hitting this issue. There was no configuration change, and no traffic was going through the device at any time.
Steps to reproduce the issue:
Load a 202012 image on Arista 7060 or 7260 box
Do a warm-reboot to master branch image
Describe the results you received:
Warmboot fails and watchdog (or kernel panic) causes reboot
Description
When doing a warmboot upgrade from 202012 branch to master branch, it sometimes fails. Specifically, it appears that the master branch kernel fails to initialize and switch to userspace.
Some debugging showed that at least sometimes, there is a kernel panic that may be happening:
The above message was seen either immediately after kexec or when the login prompt appears on the console (showing that, in those cases, it did switch to userspace, but something still caused it to panic.
What's odder is that a lab device that used to exhibit this issue consistently now doesn't appear to be hitting this issue. There was no configuration change, and no traffic was going through the device at any time.
Steps to reproduce the issue:
Describe the results you received:
Warmboot fails and watchdog (or kernel panic) causes reboot
Describe the results you expected:
Warmboot should be successful
Output of
show version
:Output of
show techsupport
:Additional information you deem important (e.g. issue happens only occasionally):