Open donnlee opened 7 years ago
Hi Donn, This could be a duplicate of https://github.com/platinasystems/go/issues/4 https://github.com/platinasystems/go/issues/. Since you are seeing this on invader8, refer to the text in red below regarding alpha units.
This issue is caused by the x86 performing a cold reboot which power cycles the x86. This power cycle causes the TH to require a hard reset to come back up on PCIe.
With AMI BIOS, in debian "reboot" command performs a cold reboot while the "reboot -f" command performs a warm reboot. With coreboot, the intel FSP will perform a cold reboot regardless of linux side command. We are working with intel to figure out how to support warm reboot with the FSP.
In the mean time, with Alpha units (invader1-15) to avoid this issue use AMI BIOS and "reboot -f" or issue the following commands to hard reset the TH before launching goes:
sudo i2cset -y 0x0 0x74 0x6 0xf8
sudo i2cset -y 0x0 0x74 0x2 0xfc
sudo i2cset -y 0x0 0x74 0x2 0xff
sudo echo 1 > /sys/bus/pci/rescan
sudo rmmod uio_pci_dma
sudo insmod
With Beta units and beyond (invader16-24), this issue does not occur with CPLD v8 or higher or with the "reboot -f" command with lower CPLD versions. Check CPLD version with the ioget tool (sudo ./ioget 0x600).
https://github.com/platinasystems/go/issues/ t https://github.com/platinasystems/go/issues/4hanks Jason
This bug mentions an Intel firmware update fixes it -https://bugzilla.redhat.com/show_bug.cgi?id=1293901
I did try the latest intel microcode and unfortunately this particular problem still occurs. I think the "Kernel panic - not syncing: Timeout: Not all cpus entered broadcast exception handler” message may be generated for various reasons that cause a panic. In our case it’s a PCIe timeout under certain conditions that’s causing the panic.
-jason
This is github issue for tracking kernel panics during invader warm-boots. To clear the problem, we have been issuing power-cycle cold-boots.
After upgrading goes to http://downloads.platinasystems.com/LATEST/goes-platina-mk1 today, I warm-booted after seeing strange gobgp behavior:
Warm-boot (console) said:
I power-cycled to recover.
Commit: https://github.com/platinasystems/go/commit/dcb42afc1f93cfd8a6d1d4cdb8c6549f37d3761f