ibm-s390-linux / s390-tools

Tools for use with the s390 Linux kernel and device drivers
MIT License
63 stars 60 forks source link

v2.34.0 fails to build on Ubuntu 24.10 with 'Heap section doesn't conform to the described memory layout' #174

Open frank-heimes opened 3 weeks ago

frank-heimes commented 3 weeks ago

With Ubuntu 24.10 / oracular and it's pretty new tool chain and build environment (e.g. gcc-14 v14.2.0-2ubuntu1 and binutils v2.43-2ubuntu1) the build of s390-tools v2.34.0 fails, and I believe these are the relevant lines from the build log:

gcc -E -Wp,-MD,.stage3a.lds.d,-MT,stage3a.lds -I../..//zipl/boot -I../..//zipl/include -I../..//include -P -C -o stage3a.lds stage3a.lds.S gcc -I ../../include -Wdate-time -D_FORTIFY_SOURCE=3 -D_GNU_SOURCE -fno-pie -Os -g -I../..//zipl/boot -I../..//zipl/include -I../..//include -DENABLE_SCLP_ASCII=1 -DS390_TOOLS_RELEASE=2.34.0-build-20240814 -fno-builtin -ffreestanding -fno-asynchronous-unwind-tables -fno-delete-null-pointer-checks -fno-stack-protector -fexec-charset=IBM1047 -m64 -mpacked-stack -mstack-size=4096 -mstack-guard=128 -msoft-float -Wall -Wformat-security -Wextra -Wno-array-bounds -c stage3b.c -o stage3b.o gcc -E -Wp,-MD,.stage3b.lds.d,-MT,stage3b.lds -I../..//zipl/boot -I../..//zipl/include -I../..//include -P -C -o stage3b.lds stage3b.lds.S gcc -E -Wp,-MD,.stage3b_reloc.lds.d,-MT,stage3b_reloc.lds -I../..//zipl/boot -I../..//zipl/include -I../..//include -P -C -o stage3b_reloc.lds stage3b_reloc.lds.S gcc -no-pie -Wl,--no-warn-rwx-segments -Wl,-T,stage3a.lds -Wl,--build-id=none -m64 -static -nostdlib stage3a.o head.o stage3a_init.o libc.o ebcdic.o ebcdic_conv.o sclp.o entry.o -o stage3a.elf /usr/bin/ld: Heap section doesn't conform to the described memory layout collect2: error: ld returned 1 exit status make[4]: [Makefile:77: stage3a.elf] Error 1 make[4]: Leaving directory '/<>/genprotimg/boot' make[3]: [Makefile:20: all-recursive] Error 1 make[3]: Leaving directory '/<>/genprotimg' make[2]: *** [Makefile:56: genprotimg] Error 2

The entire build log can be found here: https://launchpadlibrarian.net/743774862/buildlog_ubuntu-oracular-s390x.s390-tools_2.34.0-0ubuntu1_BUILDING.txt.gz

I already tried slightly different binutils version (2.42.90.20240720-2ubuntu1, binutils - 2.42-4ubuntu3 and binutils - 2.42-4ubuntu2), since it looks like a linker issue, but no luck. I also patched gcc to incl. the https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=e903ada5e8881acec734eb3f89c3644bbd8da7e9 [that fixes a zlib build] but this also didn't solve the problem.

So I wanted to share this issue here in case here you can provide assistance.

Btw. I've noticed the linker issue that was reported here: https://github.com/ibm-s390-linux/s390-tools/issues/171 but I cannot find an obvious relationship.

(I can share steps to setup a local build env. for 24.10, that allows to reproduce the situation.)

bolives-hax commented 3 weeks ago

hey i made the #171 issue you refrenced . Can you maybe set -march=cpu-type such as "z10" or "z13". As it seems in my case despite using new gcc 14 the march setting is what affected my linker issues which seem related to yours its just that yours are in the stage3 as it seems while mine started at stage2 already.

I see from your logs gcc is invoked as

gcc -no-pie -Wl,--no-warn-rwx-segments -Wl,-T,stage3a.lds -Wl,--build-id=none -m64 -static -nostdlib stage3a.o head.o stage3a_init.o libc.o ebcdic.o ebcdic_conv.o sclp.o entry.o -o stage3a.elf

before producing (failing with):

/usr/bin/ld: Heap section doesn't conform to the described memory layout
collect2: error: ld returned 1 exit status

Looking at it seems to be (from an technical) aspect quite similar to my issue it just happens with a different section in a different place (stage3 which comes after2)

if you see one of the replies the devs gave in my issues:

Most likely no one used the bad combination, in Fedora we are now at z13 as the arch level, we were on zEC12 for a long time and on z10 before that. The booloader is logically space constrained so it might be affected by a bad compiler version & flags combination. You probably want to build for something newer even than z10 anyway, because there are unlikely any z10 systems still in active service.

Originally posted by @sharkcz in https://github.com/ibm-s390-linux/s390-tools/issues/171#issuecomment-2258137192

he mentions quote on quote " in Fedora we are now at z13 as the arch level, we were on zEC12 for a long time and on z10 before that" , it seems as just like in my issue in yours there isn't an march= being set. What this guy said in the quoted comment above though is that for fedora they set the arch level to z13 meaning gcc is being invoked with --march=z13.

so its very likely to assume juding by this comment and the fact that fedora enforces march13 for builds that the devs work on this under the assumption that march=z13 . While I could get away with march=z10 not setting an march would according to the GCC man page:

-march=cpu-type
Generate code that runs on cpu-type, which is the name of a system representing a certain processor type. Possible values for cpu-type are ‘z900’/‘arch5’, ‘z990’/‘arch6’, ‘z9-109’, ‘z9-ec’/‘arch7’, ‘z10’/‘arch8’, ‘z196’/‘arch9’, ‘zEC12’, ‘z13’/‘arch11’, ‘z14’/‘arch12’, ‘z15’/‘arch13’, ‘z16’/‘arch14’, and ‘native’.

The default is -march=z900.

Specifying ‘native’ as cpu type can be used to select the best architecture option for the host processor. -march=native has no effect if GCC does not recognize the processor.

result in z900 being selected -> the oldest Z series cpu.

Tl;Dr (have you tried setting march to z10 or z13? ) as nowhere in your logs do i find that. While I don't know the build tool you used I assume its possible to

1) set CFLAGS="-march=z10" or "-march=z13" (what they use) or 2) directly patch the calls to GCC in the makefile or sth like that (just make sure you don't have differing marchitectures)

sharkcz commented 3 weeks ago

for the record, no such problem with 2.34.0 in Fedora

frank-heimes commented 3 weeks ago

Hey, thanks for the replies! The default should already be march=z13. Anyway, I tried setting this in my build guidelines (debian/rules) with: DEB_CFLAGS_MAINT_APPEND DEB_CPPFLAGS_MAINT_APPEND and also tried via common.mak, but no difference. Anyway, I'll check the build flags of the binaries again next week to be sure.

frank-heimes commented 3 weeks ago

I just double checked, but the package(s) are indeed already build using "-march=z13" (and "-mtune=z16") - according to DW_AT_producer:

DW_AT_producer : (alt indirect string, offset: 0x996) GNU GIMPLE 14.2.0 -mtune=z16 -march=z13 -mbackchain -mtune=z16 -march=z13 -m64 -mzarch -g -g -O2 -O2 -fno-openmp -fno-openacc -fPIC -fcf-protection=none -fasynchronous-unwind-tables -fstack-protector-strong -fno-stack-clash-protection -ffat-lto-objects -fltrans

frank-heimes commented 3 weeks ago

I created a map file on request, using: grep -ri "-Map" genprotimg/boot/Makefile $(LINK) $(NO_PIE_LDFLAGS) $(NO_WARN_RWX_SEGMENTS_LDFLAGS) -Wl,-T,$< -Wl,--build-id=none -m64 -static -nostdlib $(filter %.o, $^) -o $@ -Xlinker -Map=$@.map It's attached ...

frank-heimes commented 3 weeks ago

Had today a chat with Marc Hartmayer, and after having had a look at the map fie his suggestion was to either modify the Section discard setting from (.note.GNU-stack) to (.note.) (which can potentially be a bit dangerous, means maybe discarding too much, since better:) or adding: (.note.package) to *(.note.GNU-stack) This is what I did and created a quilt patch that does it for all bootloader (and tools) related lds.S files - see attachment - that allowed me to overcome this issue.

I had to solve another, but unrelated Rust issue (I'll create a separate PR for that).

With both fixes I was able to built 2.34.0 locally on a 24.10 / s390x system, as well as in a clean PPA.

Thx again to Marc - I'll plan to create PRs for this ...

fix-unnecessary-qualification-error-in-ffi-rs-.PATCH discard-note-package-in-lds-files-to-solve-linker-error.PATCH

sharkcz commented 3 weeks ago

so your Rust issue is #173 I believe :-)

frank-heimes commented 3 weeks ago

Oh yes @sharkcz didn't noticed #173. Well, it was so easy to solve (since the fix is part of the message) that I haven't looked it up here or somewhere else.

Good - one PR less to do (thx for the reference).