Closed pridhiviraj closed 5 years ago
I found out that the problem is related to using the Linux kernel 4.17 and higher, the fast reboot functionality was broken after commit https://github.com/open-power/op-build/commit/8c2a50958956296df2e64135b9e586f3bca168b4.
The problem affected all host's systems with the Linux Kernel 4.17 and newer.
I have checked the latest op-build (27 Sep) with old Linux Kernel 4.16.9 - everything works fine.
@pridhiviraj Which system did you see this on?
Fast reboot is working for me on palmetto with 4.19-rc7 with skiboot 6.1:
/ # uname -a
Linux skiroot 4.19.0-rc7-00013-g167da9496f69 #34 SMP Tue Oct 9 15:54:29 ACDT 2018 ppc64le GNU/Linux
/ # reboot
The system is going down NOW!
Sent SIGTERM to all processes
Sent SIGKILL to all processes
Requesting system reboot
[179030.122589] reboot: Restarting system
[179497.280827019,5] OPAL: Reboot request...
[179497.280950509,5] RESET: Initiating fast reboot 6...
[179497.295292684,5] PSI: Hot reset!
[179497.315869551,5] Need EOI !
[179497.326247957,5] PCI: Clearing all devices...
[179497.336627718,5] PCI: Resetting PHBs and training links...
[179500.096197207,5] PCI: Probing slots...
@shenki @pridhiviraj left IBM recently.
So if its working for you, then probably we can close this one.
-Vasant
Thanks for the reminder. Do we have any idea which machine he would have been using?
I spoke with @artemsen at the OpenPower summit, and they were seeing this on their machines. I was hoping we could reproduce on a machine I have access to, so we could debug.
I did a build of 4.16.13-openpower1 and fast reboot worked fine for me on Palmetto. I am booting to the petitboot prompt and doing a reboot from there.
Thanks for the reminder. Do we have any idea which machine he would have been using?
Sorry. I have no idea which system he was using. May be we should try on habenaro once?
-Vasant
If you have a patch or any idea, I can check the solution on our VESNIN P8 server.
I was looking at the powerpc/powernv kernel changes between 4.16 and 4.17, and this one stood out:
https://git.kernel.org/torvalds/c/f2748bdfe157343eb8cf910a1d89ccf2fd20100b
If you are able to build a 4.17 kernel with this patch reverted, and try to see if fast reboot works again.
Yes, you are right. Fast reboot works fine with this patch reverted. I have checked it with the latest 4.19-rc7 kernel.
Thanks for testing. What version of skiboot are you using?
Can you also test master of skiboot to see if it has the same behaviour?
I checked the latest op-build with master skiboot and kernel patch with revert, fast reboot is ok.
Version info: buildroot-2018.05.1-114-g1822255 skiboot-7dbf80d-p3a351c9 hostboot-p8-335b7ca-p46b72d9 occ-p8-28f2cec-p631354c linux-4.18.6-openpower1-pbe305a2 petitboot-1.9.1 machine-xml-4fb3a4b hostboot-binaries-hw091818a.930 capp-ucode-p9-dd2-v4
I haven't done much direct controls hacking on P8, but the XSCOM SRESET should interrupt other CPUs regardless of what they are doing if interrupts are hard disabled etc. I'm not sure what is going wrong.
It would be interesting to see the PR_DEBUG messages. You could increase the timeout to infinite so the system can be debugged more easily, then it would be interesting to know where secondaries are stuck, can you use pdbg to find out?
@npiggin could you describe in more detail what I have to do? Sorry, I'm not familiar with pdbg yet ;)
I haven't used pdbg on a POWER8 for a long time, Rashmica has been working on it, I can ask her on Monday.
You want an upstream pdbg cross compiled for the BMC (I found build instructions in pdbg source tree are quite easy). Then when you reboot and it hangs (replace the timeout with infinite loop so it hangs rather than IPLs), you can use pdbg to stop all threads, then get their instruction address. Something like pdbg -a stop ; pdbg -a getnia
pdbg -pX -cY -tZ regs will dump a more detailed set of regs and stack of a particular CPU to dig deeper into it if needed.
You may just need some extra targeting parameters on pdbg to make it work with the P8, I will have to check.
If you a running a recent OpenBMC that uses the ColdFire FSI driver, and your pdbg is built from master, it will detect he backend on it's own, so you won't need any extra commands.
All threads execute the function fast_reboot_entry()
, except for the first one (it is in cpu_state_wait_all_others()
) and the one that stuck inside opal_pci_set_pbcq_tunnel_bar()
. May be our lock inside the last one.
The odd thread isn't in opal_pci_set_pbcq_tunnel_bar(). The NIA has a leading 0xc in the address, which suggests that it's still inside the kernel. The register dump confirms that, since the dumped MSR says that Instruction and Data relocation are enabled:
# pdbg --backend=i2c --device=/dev/i2c-4 -p2 -c6 -t6 regs
NIA : 0xc00000000002acf0
CFAR : 0xc00000000002acf0
MSR : 0x9000000000001033
LR : 0xc00000000002acec
Yes, you are right.
c00000000002acbc <nmi_stop_this_cpu>:
c00000000002acbc: ac 01 4c 3c addis r2,r12,428
c00000000002acc0: 44 b0 42 38 addi r2,r2,-20412
c00000000002acc4: a6 02 08 7c mflr r0
c00000000002acc8: 10 00 01 f8 std r0,16(r1)
c00000000002accc: e1 ff 21 f8 stdu r1,-32(r1)
c00000000002acd0: 8d ff ff 4b bl c00000000002ac5c <nmi_ipi_lock+0x8>
c00000000002acd4: 09 00 22 3d addis r9,r2,9
c00000000002acd8: e4 cd 29 81 lwz r9,-12828(r9)
c00000000002acdc: 09 00 42 3d addis r10,r2,9
c00000000002ace0: ff ff 29 39 addi r9,r9,-1
c00000000002ace4: e4 cd 2a 91 stw r9,-12828(r10)
c00000000002ace8: f5 fe ff 4b bl c00000000002abdc <nmi_ipi_unlock+0x8>
c00000000002acec: 78 0b 21 7c mr r1,r1
c00000000002acf0: 00 00 00 48 b c00000000002acf0 <nmi_stop_this_cpu+0x34>
But I still don't know what to do with this information. For me it's looking like an eternal cycle at c00000000002acf0.
static void nmi_stop_this_cpu(struct pt_regs *regs)
{
nmi_ipi_lock();
if (nmi_ipi_busy_count > 1)
nmi_ipi_busy_count--;
nmi_ipi_unlock();
spin_begin();
while (1)
spin_cpu_relax();
}
So spinning there is intentional. The question here is why this thread doesn't get pulled back into OPAL by the thread that is doing the reset. It looks like @npiggin sent some patches for some bugs around the NMI IPI in the 4.18 cycle so maybe that's what fixed it. Nick, what do you think?
The NMI IPI stuff in the kernel has had some issues, yes. I actually don't know if I have fixed the last of them in upstream kernel even. This presumably is what caused the bug to show up, I don't know off the top of my head what the problem is or if we have a fix upstream, I'll have to get back to this code soon.
However I'm struggling to see why nmi_stop_this_cpu is not being fast rebooted. It does spin with MSR[EE]=0 which is the only thing that should be "unusual" about it from a kernel point of view. Direct controls sreset should not care about that. The whole point is being able to recover a CPU no matter how stuck the OS has become.
oohal do you have access to a system in locked up state? We might have to debug direct controls code a bit more.
I should resurrect my instruction ramming patch that's the adaptation of Alistair's old one and then morphed to follow what pdbg does
@stewart-ibm, it appears like this issue got automatically closed by your test commit that haven't actually resolve this issue.
Argh, github being too careful.
It should have referenced this commit as it does actually implement a work-around:
commit 14f709b8eeda7e9ea7169b782e981c908de92e10
Author: Stewart Smith <stewart@linux.ibm.com>
Date: Fri May 3 16:45:53 2019 +1000
Disable fast-reset for POWER8
There is a bug with fast-reset when CPU cores are busy, which can be
reproduced by running `stress` and then trying `reboot -ff` (this is
what the op-test test cases FastRebootHostStress and
FastRebootHostStressTorture do). What happens is the cores lock up,
which isn't the best thing in the world when you want them to start
executing instructions again.
A workaround is to use instruction ramming, which while greatly
increasing the reliability of fast-reset on p8, doesn't make it perfect.
Instruction ramming is what pdbg was modified to do in order to have the
sreset functionality work reliably on p8.
pdbg patches: https://patchwork.ozlabs.org/project/pdbg/list/?series=96593&state=*
Fixes: https://github.com/open-power/skiboot/issues/185
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
I'll pull out my instruction ramming patches and post them to the list too, and maybe someone can find where things are going wrong, but I've managed to still have them fail.
skiboot is in latest master.