Open madscientist159 opened 6 years ago
Note that the machine still boots from a recovery disk. I wonder if the LSI controller needs a hard reset the same way the USB controllers need to be reset on kexec(), and if the hard PCIe reset on kexec() should be generalized to all plugged PCIe devices?
System firmware versions:
Primary platform versions:
open-power-talos-ab0d559
buildroot-2017.11.2-8-g4b6188e0f2
skiboot-2154c2e
hostboot-28927a7
linux-v4.15.9-openpower1-p259257c
petitboot-v1.6.6-p836d356
machine-xml-f65c2a1
occ-f72f857
sbe-9b78381
hcode-644d3f8
Note that this is built from our tree at https://git.raptorcs.com so some of the hashes may differ from the stock open-power repository versions.
Not seeing this on a custom 4.16-rc5 recompile. Not sure why it happens on 4.14; the main issue here is that Debian Buster (the next release) is 4.14 AFAIK, so it's desirable to be able to boot the next Debian release without requiring a custom kernel.
Not sure where to file this, putting it here until the actual failing component can be determined.
On a Talos system with DD2.2 CPUs and multiple LSI SAS controllers, under Debian (and presumably other distros), one or more of the controller drivers locks up. I'm not sure if this is hostboot, OPAL, or something else related to DD2.2 at this point.
What is interesting about the lockup is that it only happens after the system is installed, it does not impact the installation kernel even though the kernel versions should be identical. The installer has no issues communicating with the SAS disks.
Kernel log: