fast-reboot failing on P8 platforms

pridhiviraj commented 6 years ago

/ # reboot
The system is going down NOW!
Sent SIGTERM to all processes
Sent SIGKILL to all processes
Requesting system reboot
[   23.258131] reboot: Restarting system
[   87.095999797,5] OPAL: Reboot request...
[   87.096069956,5] RESET: Initiating fast reboot 1...
[   88.110060268,5] RESET: Fast reboot timed out waiting for secondaries to call in

skiboot is in latest master.

skiboot-v6.0.1-61-g1b86a92b6cb6

artemsen commented 6 years ago

I found out that the problem is related to using the Linux kernel 4.17 and higher, the fast reboot functionality was broken after commit https://github.com/open-power/op-build/commit/8c2a50958956296df2e64135b9e586f3bca168b4.

The problem affected all host's systems with the Linux Kernel 4.17 and newer.

I have checked the latest op-build (27 Sep) with old Linux Kernel 4.16.9 - everything works fine.

shenki commented 6 years ago

@pridhiviraj Which system did you see this on?

Fast reboot is working for me on palmetto with 4.19-rc7 with skiboot 6.1:

/ # uname -a
Linux skiroot 4.19.0-rc7-00013-g167da9496f69 #34 SMP Tue Oct 9 15:54:29 ACDT 2018 ppc64le GNU/Linux
/ # reboot
The system is going down NOW!
Sent SIGTERM to all processes
Sent SIGKILL to all processes
Requesting system reboot
[179030.122589] reboot: Restarting system
[179497.280827019,5] OPAL: Reboot request...
[179497.280950509,5] RESET: Initiating fast reboot 6...
[179497.295292684,5] PSI: Hot reset!
[179497.315869551,5]  Need EOI !
[179497.326247957,5] PCI: Clearing all devices...
[179497.336627718,5] PCI: Resetting PHBs and training links...
[179500.096197207,5] PCI: Probing slots...

hegdevasant commented 6 years ago

@shenki @pridhiviraj left IBM recently.

So if its working for you, then probably we can close this one.

-Vasant

shenki commented 6 years ago

Thanks for the reminder. Do we have any idea which machine he would have been using?

I spoke with @artemsen at the OpenPower summit, and they were seeing this on their machines. I was hoping we could reproduce on a machine I have access to, so we could debug.

I did a build of 4.16.13-openpower1 and fast reboot worked fine for me on Palmetto. I am booting to the petitboot prompt and doing a reboot from there.

hegdevasant commented 6 years ago

Thanks for the reminder. Do we have any idea which machine he would have been using?

Sorry. I have no idea which system he was using. May be we should try on habenaro once?

-Vasant

artemsen commented 6 years ago

If you have a patch or any idea, I can check the solution on our VESNIN P8 server.

shenki commented 6 years ago

I was looking at the powerpc/powernv kernel changes between 4.16 and 4.17, and this one stood out:

https://git.kernel.org/torvalds/c/f2748bdfe157343eb8cf910a1d89ccf2fd20100b

If you are able to build a 4.17 kernel with this patch reverted, and try to see if fast reboot works again.

artemsen commented 6 years ago

Yes, you are right. Fast reboot works fine with this patch reverted. I have checked it with the latest 4.19-rc7 kernel.

shenki commented 6 years ago

Thanks for testing. What version of skiboot are you using?

Can you also test master of skiboot to see if it has the same behaviour?

artemsen commented 6 years ago

I checked the latest op-build with master skiboot and kernel patch with revert, fast reboot is ok.

Version info: buildroot-2018.05.1-114-g1822255 skiboot-7dbf80d-p3a351c9 hostboot-p8-335b7ca-p46b72d9 occ-p8-28f2cec-p631354c linux-4.18.6-openpower1-pbe305a2 petitboot-1.9.1 machine-xml-4fb3a4b hostboot-binaries-hw091818a.930 capp-ucode-p9-dd2-v4

npiggin commented 6 years ago

I haven't done much direct controls hacking on P8, but the XSCOM SRESET should interrupt other CPUs regardless of what they are doing if interrupts are hard disabled etc. I'm not sure what is going wrong.

It would be interesting to see the PR_DEBUG messages. You could increase the timeout to infinite so the system can be debugged more easily, then it would be interesting to know where secondaries are stuck, can you use pdbg to find out?

artemsen commented 6 years ago

@npiggin could you describe in more detail what I have to do? Sorry, I'm not familiar with pdbg yet ;)

npiggin commented 6 years ago

I haven't used pdbg on a POWER8 for a long time, Rashmica has been working on it, I can ask her on Monday.

You want an upstream pdbg cross compiled for the BMC (I found build instructions in pdbg source tree are quite easy). Then when you reboot and it hangs (replace the timeout with infinite loop so it hangs rather than IPLs), you can use pdbg to stop all threads, then get their instruction address. Something like pdbg -a stop ; pdbg -a getnia

pdbg -pX -cY -tZ regs will dump a more detailed set of regs and stack of a particular CPU to dig deeper into it if needed.

npiggin commented 6 years ago

You may just need some extra targeting parameters on pdbg to make it work with the P8, I will have to check.

shenki commented 5 years ago

If you a running a recent OpenBMC that uses the ColdFire FSI driver, and your pdbg is built from master, it will detect he backend on it's own, so you won't need any extra commands.

artemsen commented 5 years ago

All threads execute the function fast_reboot_entry(), except for the first one (it is in cpu_state_wait_all_others()) and the one that stuck inside opal_pci_set_pbcq_tunnel_bar(). May be our lock inside the last one.

fastreboot_logs.tar.gz

oohal commented 5 years ago

The odd thread isn't in opal_pci_set_pbcq_tunnel_bar(). The NIA has a leading 0xc in the address, which suggests that it's still inside the kernel. The register dump confirms that, since the dumped MSR says that Instruction and Data relocation are enabled:

# pdbg --backend=i2c --device=/dev/i2c-4 -p2 -c6 -t6 regs
NIA   : 0xc00000000002acf0
CFAR  : 0xc00000000002acf0
MSR   : 0x9000000000001033
LR    : 0xc00000000002acec

artemsen commented 5 years ago

Yes, you are right.

c00000000002acbc <nmi_stop_this_cpu>:
c00000000002acbc:   ac 01 4c 3c     addis   r2,r12,428
c00000000002acc0:   44 b0 42 38     addi    r2,r2,-20412
c00000000002acc4:   a6 02 08 7c     mflr    r0
c00000000002acc8:   10 00 01 f8     std     r0,16(r1)
c00000000002accc:   e1 ff 21 f8     stdu    r1,-32(r1)
c00000000002acd0:   8d ff ff 4b     bl      c00000000002ac5c <nmi_ipi_lock+0x8>
c00000000002acd4:   09 00 22 3d     addis   r9,r2,9
c00000000002acd8:   e4 cd 29 81     lwz     r9,-12828(r9)
c00000000002acdc:   09 00 42 3d     addis   r10,r2,9
c00000000002ace0:   ff ff 29 39     addi    r9,r9,-1
c00000000002ace4:   e4 cd 2a 91     stw     r9,-12828(r10)
c00000000002ace8:   f5 fe ff 4b     bl      c00000000002abdc <nmi_ipi_unlock+0x8>
c00000000002acec:   78 0b 21 7c     mr      r1,r1
c00000000002acf0:   00 00 00 48     b       c00000000002acf0 <nmi_stop_this_cpu+0x34>

But I still don't know what to do with this information. For me it's looking like an eternal cycle at c00000000002acf0.

oohal commented 5 years ago

static void nmi_stop_this_cpu(struct pt_regs *regs)
{
        nmi_ipi_lock();
        if (nmi_ipi_busy_count > 1) 
                nmi_ipi_busy_count--;
        nmi_ipi_unlock();

        spin_begin();
        while (1)
                spin_cpu_relax();
}

So spinning there is intentional. The question here is why this thread doesn't get pulled back into OPAL by the thread that is doing the reset. It looks like @npiggin sent some patches for some bugs around the NMI IPI in the 4.18 cycle so maybe that's what fixed it. Nick, what do you think?

npiggin commented 5 years ago

The NMI IPI stuff in the kernel has had some issues, yes. I actually don't know if I have fixed the last of them in upstream kernel even. This presumably is what caused the bug to show up, I don't know off the top of my head what the problem is or if we have a fix upstream, I'll have to get back to this code soon.

However I'm struggling to see why nmi_stop_this_cpu is not being fast rebooted. It does spin with MSR[EE]=0 which is the only thing that should be "unusual" about it from a kernel point of view. Direct controls sreset should not care about that. The whole point is being able to recover a CPU no matter how stuck the OS has become.

oohal do you have access to a system in locked up state? We might have to debug direct controls code a bit more.

ghost commented 5 years ago

I should resurrect my instruction ramming patch that's the adaptation of Alistair's old one and then morphed to follow what pdbg does

AlexanderAmelkin commented 5 years ago

@stewart-ibm, it appears like this issue got automatically closed by your test commit that haven't actually resolve this issue.

ghost commented 5 years ago

Argh, github being too careful.

It should have referenced this commit as it does actually implement a work-around:

commit 14f709b8eeda7e9ea7169b782e981c908de92e10
Author: Stewart Smith <stewart@linux.ibm.com>
Date:   Fri May 3 16:45:53 2019 +1000

    Disable fast-reset for POWER8

    There is a bug with fast-reset when CPU cores are busy, which can be
    reproduced by running `stress` and then trying `reboot -ff` (this is
    what the op-test test cases FastRebootHostStress and
    FastRebootHostStressTorture do). What happens is the cores lock up,
    which isn't the best thing in the world when you want them to start
    executing instructions again.

    A workaround is to use instruction ramming, which while greatly
    increasing the reliability of fast-reset on p8, doesn't make it perfect.

    Instruction ramming is what pdbg was modified to do in order to have the
    sreset functionality work reliably on p8.
    pdbg patches: https://patchwork.ozlabs.org/project/pdbg/list/?series=96593&state=*

    Fixes: https://github.com/open-power/skiboot/issues/185
    Signed-off-by: Stewart Smith <stewart@linux.ibm.com>

I'll pull out my instruction ramming patches and post them to the list too, and maybe someone can find where things are going wrong, but I've managed to still have them fail.

open-power / skiboot

fast-reboot failing on P8 platforms #185