ibm-s390-linux / s390-tools

Tools for use with the s390 Linux kernel and device drivers
MIT License
63 stars 59 forks source link

KVM guests abort in zipl stage 3 when bootindex is not specified #88

Closed huth closed 3 years ago

huth commented 4 years ago

We are currently facing a weird issue that seems to have something to do with changes in the zipl stage 3 loader code: After using the latest version of zipl in a KVM guest (I'm using commit 53b949926f1bf0c6 right now), and shutting down the guest, it is not possible to start the guest anymore. The boot process aborts:

$ qemu-system-s390x -enable-kvm -m 2G -nographic -d guest_errors \ -blockdev node-name=file_image1,driver=file,filename=/path/to/guest.qcow2 \ -blockdev node-name=drive_image1,driver=qcow2,file=file_image1 \ -device virtio-blk-ccw,id=image1,drive=drive_image1 LOADPARM=[ ] Using virtio-blk. Using SCSI scheme. ......... Guest crashed on cpu 0: disabled-wait PSW: 0x0002000080000000 0x0000000000004512

When I downgrade the s390-tools to v2.12.0 and re-run zipl in the guest, everything works fine again, so this must be an issue that has been added recently.

huth commented 4 years ago

If I replace the "-enable-kvm" with "-accel tcg -d in_asm" in the QEMU command line, I can see that these are the last instructions that are executed by the guest:

0x000000000000a7fc: lay %r15,-24(%r15) 0x000000000000a802: lg %r1,0(%r5) 0x000000000000a808: stg %r2,168(%r15) 0x000000000000a80e: stg %r1,160(%r15) 0x000000000000a814: lpswe 160(%r15)

... looking at the addresses of the code, it seems to me that this is somewhere in the zipl stage 3 code.

huth commented 4 years ago

... and I forgot to mention, the guest boots fine if I add a "bootindex=1" to the device parameter:

qemu-system-s390x -enable-kvm -m 2G -nographic -d guest_errors \ -blockdev node-name=file_image1,driver=file,filename=/path/to/guest.qcow2 \ -blockdev node-name=drive_image1,driver=qcow2,file=file_image1 \ -device virtio-blk-ccw,id=image1,drive=drive_image1,bootindex=1

huth commented 4 years ago

I've bisected the issue, and ended up here:

e67f6300862d939d212d79c4ce5e1249102ddcd3 is the first bad commit commit e67f6300862d939d212d79c4ce5e1249102ddcd3 Author: Stefan Haberland sth@linux.ibm.com Date: Mon Mar 30 22:13:06 2020 +0200

zipl: check for valid ipl parmblock lowcore pointer

The lowcore parmblock pointer is not valid in every case. For example
it is invalid for CCW type IPL.
To have an indication if the pointer is valid do a diag308 to store the
parmblock and check if secure boot is enabled.
If it is enabled the lowcore pointer is valid and the ipl report that is
needed for secure boot can be found right behind the ipl parmblock.

Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Reviewed-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>

include/boot/ipl.h | 1 + zipl/boot/stage3.c | 24 ++++++++++++++++++++---- 2 files changed, 21 insertions(+), 4 deletions(-)

borntraeger commented 4 years ago

It looks like the zipl stage 3 boot loader panics when the diag308 store fails.

unsigned int store_ipl_parmblock(struct ipl_pl_hdr *pl_hdr) { int rc;

    rc = diag308(DIAG308_STORE, pl_hdr);
    if (rc == DIAG308_RC_OK &&
            pl_hdr->version <= IPL_MAX_SUPPORTED_VERSION)
            return 0;

    return 1;

} but if we do not have a boot config we return: DIAG_308_RC_NO_CONF

think zipl should then instead assume "not secure".

sharkcz commented 4 years ago

fixed via commit 943c5dc51d493fd89f8c1b0760656446d5653be6, so it can be closed

hoeppnerj commented 3 years ago

As per previous comment, closing.