ibm-s390-linux / s390-tools

Tools for use with the s390 Linux kernel and device drivers
MIT License
63 stars 60 forks source link

zipl miscompiled with gcc9? #54

Closed sharkcz closed 5 years ago

sharkcz commented 5 years ago

This is more a heads-up than anything else, because I don't have clear evidence yet. But we see boot failure in Fedora when the boot is prepared with zipl compiled with gcc9, while using one still compiled with gcc8 works. Specifically s390utils-2.7.1-4.fc30 is bad and s390utils-2.7.1-2.fc30 is good.

What is clearly different is the /boot/bootmap file, it differs when created by the 2 different zipl versions that were run on the same /boot content. AFAIK it shouldn't change.

As usual it could be a real gcc9 issue or some problem with zipl code, like relying on undefined behaviour.

The failed boot looks like

00: Storage cleared - system reset.
00: HCPGIR453W CP entered; program interrupt loop

which usually means something went wrong very early in the boot, like the "bootmap" content being wrong.

hoeppnerj commented 5 years ago

Thanks for the heads-up. Just to untangle a possible confusion, s390utils-2.7.1-4.fc30 is compiled with GCC9 and s390utils-2.7.1-2.fc30 with GCC8 then? Or is the -4 version generally broken? Are there any code changes between -2 and -4?

I assigned our zipl maintainer and if it turns out to be a GCC issue, I'll add one of our tool chain guys to have a look as well.

sharkcz commented 5 years ago

no code change between -2 and -4 (see https://src.fedoraproject.org/rpms/s390utils/commits/master), -2 is built with gcc8, -4 (and -3 too) with gcc9

you can find the binary packages at https://koji.fedoraproject.org/koji/packageinfo?packageID=255

sharkcz commented 5 years ago

Hi, have you already got more details about the culprit? Shall I keep looking into it too? Because it makes Fedora-to-be 30 installations unusable.

hoeppnerj commented 5 years ago

We're currently quite low on resources. If you've got the time to look into it, that'd be much appreciated! Thanks!

sharkcz commented 5 years ago

OK, will focus on it. BTW don't you have a tool that would convert the boot records and the bootmap to human readable format?

And for the record - builds with ubsan and asan sanitizers didn't reveal anything wrong.

sharkcz commented 5 years ago

So the problem is that stage2 is crashing when clearing the BSS section in memory (https://github.com/ibm-s390-tools/s390-tools/blob/master/zipl/boot/libc.c#L359)

disassembly of initialize() from gcc9 compiled zipl

0000000000002a60 <initialize>:
    2a60:       eb df f0 88 00 24       stmg    %r13,%r15,136(%r15)
    2a66:       c0 d0 00 00 07 61       larl    %r13,3928 <__ex_table_stop+0xc>
    2a6c:       a7 f1 1f 80             tmll    %r15,8064
    2a70:       a7 84 00 01             je      2a72 <initialize+0x12>
    2a74:       e3 f0 ff e8 ff 71       lay     %r15,-24(%r15)
    2a7a:       c4 18 00 00 08 1b       lgrl    %r1,3ab0 <conv_vec.2142+0x178>
    2a80:       e3 10 01 d0 00 24       stg     %r1,464
    2a86:       c0 10 00 00 07 31       larl    %r1,38e8 <pgm_check_handler>
    2a8c:       e3 10 01 d8 00 24       stg     %r1,472
    2a92:       c4 48 00 00 08 07       lgrl    %r4,3aa0 <conv_vec.2142+0x168>
    2a98:       a7 39 00 00             lghi    %r3,0
    2a9c:       e3 40 d0 00 00 09       sg      %r4,0(%r13)
    2aa2:       c4 28 00 00 08 03       lgrl    %r2,3aa8 <conv_vec.2142+0x170>
    2aa8:       c0 e5 ff ff fe a4       brasl   %r14,27f0 <memset>
    2aae:       eb df f0 a0 00 04       lmg     %r13,%r15,160(%r15)
    2ab4:       c0 f4 ff ff fb 96       jg      21e0 <start>
    2aba:       07 07                   nopr    %r7
    2abc:       07 07                   nopr    %r7
    2abe:       07 07                   nopr    %r7
sharkcz commented 5 years ago

Further tracing shows that __bss_start (in R2) is 0x0000 for gcc9 zipl, while it is the expected 0x5200 for gcc8 zipl. Like the BSS section is not created properly.

sharkcz commented 5 years ago

And commands to reproduce the crash dd if=/dev/zero of=/boot/vmlinuz bs=1k count=70 && zipl -t /boot -i /boot/vmlinuz && reboot You won't get to the Booting default message that would normally appear.

sharkcz commented 5 years ago

And the root cause is that gcc9 emits a new section .rodata.cst8 for some literals and it isn't included in --only-section parameters when creating the *.bin images with objdump. PR follows.

stefan-haberland commented 5 years ago

Thanks a lot for debugging this. I will have a look at the PR asap and include it.

hoeppnerj commented 5 years ago

Fixed via cdb23f8d2251d865f0a8aa46f8ca886441048e13