Closed mathomp4 closed 2 years ago
there's not enough information for me to be able to debug this :)
The error message has the smell of a malformed object file - __TEXT,__text reloc 347: symbol index out of range for architecture
I would think that symbol indices are set by the assembler - we do not emit those from the compiler [of course, we do emit the symbols and therefore the number and ordering could be relevant].. Also, I might eb misinterpreting the error message ...
initial questions;
is it possible to run the build of HDF5 serially, with something like VERBOSE=1 to get a view of the actual command lines being emitted?
what set of clang tools are you using and is it the same as the set that worked with 11.2?
I might need the actual objects, or .i files that were used to generate them - so isolating the command lines that gave rise to the objects complained about is worth doing (those can then be manually re-issued with -save-temps -v
to obtain the intermediate outputs.
I cannot cross-compile HDF5: https://portal.hdfgroup.org/pages/viewpage.action?pageId=48808266 so I am not going to be able to reproduce this locally.
@iains I'll try to get to this tomorrow. For some reason the VPN on my M1 Macbook has decided not to work. Joy. Tomorrow I'll be at work so the VPN can be avoided. Ahh...fun with new chip architectures!
@iains Might be next week now. My M1 MacBook is not liking something extra NASA put on it. It...doesn't want to see the internet. I am a bit baffled and I'll need to find a sysadmin to figure it out.
Fun with new chip architectures!
I can only confirm that I can't reproduce that with the latest hdf5 (version 1.12.2). It build fine and all tests pass.
OK. To be honest, there's no too much rush from my side (tomorrow is 9.5 RC and then around a week until 10.4 rc).. so a lot to do there... .... plus last night there was a cloud-to-cloud lighting arc right over my house (a very loud bang, coincident with the flash) which took out my internet (and cost most of today to getting back online) .. I guess the potentials on the overhead wires were too much for the modem :(.
good to know latest is OK at least (that might help narrow things down).
I can only confirm that I can't reproduce that with the latest hdf5 (version 1.12.2). It build fine and all tests pass.
@fxcoudert Interesting! If can somehow get internet enough to pull that tag/version of HDF5, I can test that out. I have been looking for a good excuse to move our code to HDF5 1.12, and this might be it. (We tend to be conservative with some libraries since operational code has a good reason to be conservative!)
Well, I was able to get enough Internet to pull HDF 1.12.2 and I still get:
Making install in test
make[4]: Entering directory '/Users/mathomp4/Baselibs/ESMA-Baselibs-7.1.0/src/hdf5/hl/test'
CCLD test_lite
ld: warning: option -s is obsolete and being ignored
ld: in ../../hl/src/.libs/libhdf5_hl.a(H5LTparse.o), in section __TEXT,__text reloc 347: symbol index out of range for architecture arm64
collect2: error: ld returned 1 exit status
make[4]: *** [Makefile:988: test_lite] Error 1
make[4]: Leaving directory '/Users/mathomp4/Baselibs/ESMA-Baselibs-7.1.0/src/hdf5/hl/test'
I get this error with both a serial (CC=gcc) and parallel (CC=mpicc) build of HDF5.
Per a request of @iains, here is the verbose make output for the serial case:
make[2]: Entering directory '/Users/mathomp4/Baselibs/ESMA-Baselibs-7.1.0/src/hdf5/hl/test'
/bin/sh ../../libtool --tag=CC --mode=link /Users/mathomp4/installed/Core/gcc-gfortran/11.3.0/bin/gcc -std=c99 -Wall -Wcast-qual -Wconversion -Wextra -Wfloat-equal -Wformat=2 -Winit-self -Winvalid-pch -Wmissing-include-dirs -Wshadow -Wundef -Wwrite-strings -pedantic -Wno-c++-compat -Wlarger-than=2560 -Wlogical-op -Wframe-larger-than=16384 -Wpacked-bitfield-compat -Wsync-nand -Wstrict-overflow=5 -Wno-unsuffixed-float-constants -Wdouble-promotion -Wtrampolines -Wstack-usage=8192 -Wmaybe-uninitialized -Wdate-time -Warray-bounds=2 -Wc99-c11-compat -Wduplicated-cond -Whsa -Wnormalized -Wnull-dereference -Wunused-const-variable -Walloca -Walloc-zero -Wduplicated-branches -Wformat-overflow=2 -Wformat-truncation=1 -Wrestrict -Wattribute-alias -Wcast-align=strict -Wshift-overflow=2 -Wattribute-alias=2 -Wmissing-profile -Wc11-c2x-compat -fstdarg-opt -fdiagnostics-urls=never -fno-diagnostics-color -s -Wbad-function-cast -Wimplicit-function-declaration -Wmissing-declarations -Wmissing-prototypes -Wnested-externs -Wold-style-definition -Wpacked -Wpointer-sign -Wpointer-to-int-cast -Wint-to-pointer-cast -Wredundant-decls -Wstrict-prototypes -Wswitch -Wunused-function -Wunused-variable -Wunused-parameter -Wcast-align -Wunused-but-set-variable -Wformat -Wincompatible-pointer-types -Wint-conversion -Wshadow -Wcast-function-type -Wmaybe-uninitialized -Wno-aggregate-return -Wno-inline -Wno-missing-format-attribute -Wno-missing-noreturn -Wno-overlength-strings -Wno-jump-misses-init -Wno-suggest-attribute=const -Wno-suggest-attribute=noreturn -Wno-suggest-attribute=pure -Wno-suggest-attribute=format -Wno-suggest-attribute=cold -Wno-suggest-attribute=malloc -O3 -L/Users/mathomp4/installed/MPI/gcc-gfortran-11.3.0/openmpi-4.1.3/Baselibs/7.1.0/Darwin/lib -L/Users/mathomp4/installed/MPI/gcc-gfortran-11.3.0/openmpi-4.1.3/Baselibs/7.1.0/Darwin/lib -lm -o test_lite test_lite.o ../../hl/src/libhdf5_hl.la ../../test/libh5test.la ../../src/libhdf5.la -lsz -lz -ldl -lm
libtool: link: /Users/mathomp4/installed/Core/gcc-gfortran/11.3.0/bin/gcc -std=c99 -Wall -Wcast-qual -Wconversion -Wextra -Wfloat-equal -Wformat=2 -Winit-self -Winvalid-pch -Wmissing-include-dirs -Wshadow -Wundef -Wwrite-strings -pedantic -Wno-c++-compat -Wlarger-than=2560 -Wlogical-op -Wframe-larger-than=16384 -Wpacked-bitfield-compat -Wsync-nand -Wstrict-overflow=5 -Wno-unsuffixed-float-constants -Wdouble-promotion -Wtrampolines -Wstack-usage=8192 -Wmaybe-uninitialized -Wdate-time -Warray-bounds=2 -Wc99-c11-compat -Wduplicated-cond -Whsa -Wnormalized -Wnull-dereference -Wunused-const-variable -Walloca -Walloc-zero -Wduplicated-branches -Wformat-overflow=2 -Wformat-truncation=1 -Wrestrict -Wattribute-alias -Wcast-align=strict -Wshift-overflow=2 -Wattribute-alias=2 -Wmissing-profile -Wc11-c2x-compat -fstdarg-opt -fdiagnostics-urls=never -fno-diagnostics-color -s -Wbad-function-cast -Wimplicit-function-declaration -Wmissing-declarations -Wmissing-prototypes -Wnested-externs -Wold-style-definition -Wpacked -Wpointer-sign -Wpointer-to-int-cast -Wint-to-pointer-cast -Wredundant-decls -Wstrict-prototypes -Wswitch -Wunused-function -Wunused-variable -Wunused-parameter -Wcast-align -Wunused-but-set-variable -Wformat -Wincompatible-pointer-types -Wint-conversion -Wshadow -Wcast-function-type -Wmaybe-uninitialized -Wno-aggregate-return -Wno-inline -Wno-missing-format-attribute -Wno-missing-noreturn -Wno-overlength-strings -Wno-jump-misses-init -Wno-suggest-attribute=const -Wno-suggest-attribute=noreturn -Wno-suggest-attribute=pure -Wno-suggest-attribute=format -Wno-suggest-attribute=cold -Wno-suggest-attribute=malloc -O3 -o test_lite test_lite.o -L/Users/mathomp4/installed/MPI/gcc-gfortran-11.3.0/openmpi-4.1.3/Baselibs/7.1.0/Darwin/lib ../../hl/src/.libs/libhdf5_hl.a /Users/mathomp4/Baselibs/ESMA-Baselibs-7.1.0/src/hdf5/src/.libs/libhdf5.a ../../test/.libs/libh5test.a ../../src/.libs/libhdf5.a /Users/mathomp4/installed/MPI/gcc-gfortran-11.3.0/openmpi-4.1.3/Baselibs/7.1.0/Darwin/lib/libsz.a -lz -ldl -lm
ld: warning: option -s is obsolete and being ignored
ld: in ../../hl/src/.libs/libhdf5_hl.a(H5LTparse.o), in section __TEXT,__text reloc 347: symbol index out of range for architecture arm64
collect2: error: ld returned 1 exit status
I can try to do more if you can tell what you'd like.
I wonder if perhaps @fxcoudert and myself built GCC differently? I built using:
❯ clang --version
Apple clang version 13.1.6 (clang-1316.0.21.2.3)
Target: arm64-apple-darwin21.4.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
and my configure string was:
../gcc-11-branch-gcc-11.3-darwin-r0/configure \
--prefix=$HOME/installed/Core/gcc-gfortran/11.3.0 \
--enable-languages=c,c++,fortran \
--with-sysroot=/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk |& tee configure.log
But then this is the exact same configure string and clang version I built my failed 11.2 version with...and that built HDF5 just fine! Hmm...
Per a request of @iains, here is the verbose make output for the serial case:
<snip> ld: warning: option -s is obsolete and being ignored ld: in ../../hl/src/.libs/libhdf5_hl.a(H5LTparse.o), in section __TEXT,__text reloc 347: symbol index out of range for architecture arm64 collect2: error: ld returned 1 exit status
I can try to do more if you can tell what you'd like.
It is complaining about H5LTparse.o
so I'd like to see the compile line that builds that object. Then, taking that compile line, append -save-temps -v
and post the .i file, please?
@iains Okay. I think I have it. I'm attaching H5LTparse.i
as well as the make log that made it. (Well, I'm attaching H5LTparse.i.txt
so that GitHub is happy with it...)
Oh, and here is the make log but with V=1 make_h5ltparse_V1.log
thanks, I wonder if the difference (in experience) is to do with which compiler is used to build 'C' files. @fxcoudert in your build is this file built with GCC or clang?
I have not looked at the .i yet .. it might have to wait until we get 9.5RC out of the way ;)
Homebrew uses clang as C compiler. I'm sorry but I'm also out of time to run more tests, I am battling against a bug in GCC 11 & 12 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105664) and several bugs in Julia build system 😢
@iains Okay. I think I have it. I'm attaching
H5LTparse.i
as well as the make log that made it. (Well, I'm attachingH5LTparse.i.txt
so that GitHub is happy with it...)
Thanks.
I do see one difference here, in that clang defaults to '-fcommon' where GCC now defaults to -fno-common; this means that clang will indirect accesses to some variables via the GOT and GCC will not.
If the code is part of a shared library, then clang should be using "-fno-common" for the compiles (common is not normally allowed in shared libs).
I suppose that if the code is part of an exe - you could try adding '-fcommon' to the GCC command ...
(the transition to no-common as the default has resulted in an amount of fallout - but it is the right technical solution).
Other things that might 'work' could be to reduce the optimisation to 'O2' (less loop unrolling) and/or try -Os (smaller object). These are band-aid suggestion (trying to figure out if there's actually a real code-gen problem or just difference in default behaviour).
@iains Okay. I think I have it. I'm attaching
H5LTparse.i
as well as the make log that made it. (Well, I'm attachingH5LTparse.i.txt
so that GitHub is happy with it...) H5LTparse.i.txt make_h5ltparse.logThanks.
I do see one difference here, in that clang defaults to '-fcommon' where GCC now defaults to -fno-common; this means that clang will indirect accesses to some variables via the GOT and GCC will not.
- If the code is part of a shared library, then clang should be using "-fno-common" for the compiles (common is not normally allowed in shared libs).
- I suppose that if the code is part of an exe - you could try adding '-fcommon' to the GCC command ...
(the transition to no-common as the default has resulted in an amount of fallout - but it is the right technical solution).
@iains I'm not sure how @fxcoudert built HDF5, but at the moment I'm building as a static library (mainly because we have always built as static and...well...don't mess with working builds). My guess is @fxcoudert is probably building as a shared library because, well, that's how most people would nowadays.
When I'm back at work, I'll try and see if I build HDF5 as a shared library on the M1 if it's happier (although, again, HDF5-as-static did work with the bad 11.2 so... 🤷🏼 )
-O3 -o test_lite test_lite.o -L/..../lib ../../hl/src/.libs/libhdf5_hl.a
So the construction of the static library is not failing, but when using that library with presumably an internal test (for the package) .. is.
I wonder if we can find out which symbol is failing; see if we get extra info if the link line is re-issued with -Wl,-debug -Wl,-why_load
as for why 11.2 worked - the code generated for aarch64 will surely have changed between 11.2 and 11.3 ... if it has grown perhaps something has now become out of range.
If you still have the 11.2 install around, and you could generate the parser .i (and the .s since I will not be able to replicate your setup exactly) .. I could perhaps try to identify what changed (and if that is reasonable - e.g. as the result of a fixed bug).
at the moment, what is frustrating is that we have an error reported - but the error is not specific enough to be able to decide where the problem might lie.
BTW, on the assumption that @fxcoudert built the C files in this library with clang (which defaults to -fcommon) you could also add that option to your build (as noted above) - that could well have a significant effect - since common symbols are indirected through the GOT which will alter the layout quite a bit.
(we should still find out the actual problem and then decide how to fix it - it could of course be a bug in the compiler)
two small updates:
1/ I stripped out all the Wxxx to see what was actually left (and therefore what I'd need to have to replicate this)
... = Users/mathomp4/installed
libtool: link: /.../Core/gcc-gfortran/11.3.0/bin/gcc -std=c99 -s -O3 -o test_lite
test_lite.o
-L/.../MPI/gcc-gfortran-11.3.0/openmpi-4.1.3/Baselibs/7.1.0/Darwin/lib
../../hl/src/.libs/libhdf5_hl.a
/Users/mathomp4/Baselibs/ESMA-Baselibs-7.1.0/src/hdf5/src/.libs/libhdf5.a
../../test/.libs/libh5test.a
../../src/.libs/libhdf5.a
/.../MPI/gcc-gfortran-11.3.0/openmpi-4.1.3/Baselibs/7.1.0/Darwin/lib/libsz.a -lz -ldl -lm
2/ Output of otool -rV | less -N
says that _H5LTyychar is the symbol with the problem - and that IS one that would be referenced via the GOT with a -fcommon build.
347 00000210 False long True PAGOF12 False _H5LTyychar
We have a report of this same bug at Homebrew: https://github.com/orgs/Homebrew/discussions/3296 I've asked the OP there to provide some more information, hopefully we're going to narrow it down.
it would be valuable to be able to compare the .s
between the working 11.2 and the not working 11.3 - as noted above it is possible that the size of the object has grown for innocent reasons. Of course, we need to figure out what to do about it either way.
@iains a minimal reproducer was posted at https://github.com/orgs/Homebrew/discussions/3296#discussioncomment-2800314 which is Fortran-only (no clang, no C source). I can confirm it reproduces on my machine as well.
unfortunately, that means I cannot compare clang's output with GCC's :( do you have a HB 11.2 build somewhere that works with this - so that at least I could compare the asm?
however.. the problem has been spotted - we are trying to apply negative offsets to relocations (which should work, but doesn't because of an assembler/linker bug). Will sort out a patch for this ASAP.
however.. the problem has been spotted - we are trying to apply negative offsets to relocations (which should work, but doesn't because of an assembler/linker bug). Will sort out a patch for this ASAP.
Huzzah!
Also, this update is why I am not a compiler developer. I can just about grok the assembly out of godbolt for my Fortran code, but this? Yeah, you all are magicians. 😄
would you be willing to test the candidate patches on your wider code-base? (it's a long story, but as things stand I can only test on a cross-compiler, since I do not currently have access to an M1 box).
would you be willing to test the candidate patches on your wider code-base? (it's a long story, but as things stand I can only test on a cross-compiler, since I do not currently have access to an M1 box).
@iains Sure. I can give it a go.
FYI, my usual process is:
make check
with GCC because GCC's check is...comprehensive, let's say, but I probably should try it on M1!)So, 1-3 have been working, it's step 4 at the moment where I hit the HDF5 thing.
OK so, to test the patch you would need to amend the sources for the compiler (apply the patches) and then start from 1 again - if that's still OK then I'll sort out the patches and put them here (with some instructions).
OK so, to test the patch you would need to amend the sources for the compiler (apply the patches) and then start from 1 again - if that's still OK then I'll sort out the patches and put them here (with some instructions).
I can do that (with instructions) :)
OK. so the simplest thing is to clone the following branch(es) they contain patches that fix a couple of issues that have arisen since the initial post.
there's one for GCC 11.3r2 (preview of an update to release 2)
https://github.com/iains/gcc-11-branch/tree/gcc-11-3-darwin-pre-r2
there's one for GCC 12.1r1 (preview of an update to release 1)
https://github.com/iains/gcc-12-branch/tree/gcc-12-1-darwin-pre-r1
So there's no patching to do - just checkout and build the branches - if all is successful, then the updated releases will be identical modulo SHA1 hash values.
=====
@kencu @fxcoudert - the branches also include a couple of fixes for rpath-related things I've added a configuration options --with-darwin-add-path=
that allows you to specify a path that the compiler will automatically add to the embedded run paths.
( let's not hijack this thread for any further discussion of this tho ).
edit: no idea what that wants to be super bold and large, my MD-fu is clearly lacking... second edit .. ah, it's the following construct....
=====
@iains I've grabbed the 11.3 branch now and am building. If all goes well up my stack, I'll try to do the same with the 12.1 branch.
Brief update: I've been able to build HDF5! I'm going through the rest of my stack now. But looks promising! 👍🏼
Further update: All of my Baselibs built (so netCDF-C, netCDF-Fortran, ESMF...) and so did GEOS, my climate model.
And even better, GEOS ran. For 6 hours at with debug options at stupid low resolution with a lot of stuff turned off because "small laptop", but it worked. I'm doing more tests now with 11.3 to make sure our usual pattern of work will all be okay, but I think you nailed it.
Once I'm done with 11.3 testing, I'll grab your 12.1 branch and do the same with it.
But I think you might have got it, @iains !
PS: Should I be using --with-darwin-add-path=
in my builds? Or is this something a bit fancy that I don't need to worry about?
Further update: All of my Baselibs built (so netCDF-C, netCDF-Fortran, ESMF...) and so did GEOS, my climate model.
And even better, GEOS ran. For 6 hours at with debug options at stupid low resolution with a lot of stuff turned off because "small laptop", but it worked. I'm doing more tests now with 11.3 to make sure our usual pattern of work will all be okay, but I think you nailed it.
excellent!
Once I'm done with 11.3 testing, I'll grab your 12.1 branch and do the same with it.
thanks a lot for the testing!
PS: Should I be using
--with-darwin-add-path=
in my builds? Or is this something a bit fancy that I don't need to worry about?
That's an option really intended for distributions (e.g. macports, home-brew) - so not needed for stand-alone builds unless you're doing something extra fancy. (with very few exceptions) My usual rule-of-thumb about configure options, is "don't use them unless you know why you want to and what for" ...
That's an option really intended for distributions (e.g. macports, home-brew) - so not needed for stand-alone builds unless you're doing something extra fancy. (with very few exceptions) My usual rule-of-thumb about configure options, is "don't use them unless you know why you want to and what for" ...
Yeah, that's my theory as well. But with macOS sometimes you need a fancy option...and then I go steal it from @fxcoudert's brew recipe 😄
fixed with Release 2 of GCC 11.3
Thanks to @iains and #2 being solved, I continued on and built MPI and then tried to build my set of base libraries. In doing so, my build of HDF5 1.10.8 (yes, I know, old...)
The thing is, the Homebrew M1 GCC 11.2 from @fxcoudert I built before (the one that had issues) was happy to build this HDF5, so this must be something new. The error is a bit beyond my ken. :)